2008-03-27 13:14:38

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH] UBIFS - new flash file system

Dear community,

here is a new flash file system developed by Nokia engineers with
help of the University of Szeged. The new file-system is called
UBIFS, which stands for UBI file system. UBI is the wear-leveling/
bad-block handling/volume management layer which is already in
mainline (see drivers/mtd/ubi).

The main objective of UBIFS is better performance and scalability
comparing to JFFS2 which is achieved by
a) implementing write-back (JFFS2 is write-through)
b) storing and maintaining the indexing file-system information
on the media (JFFS2 maintains it in RAM and builds it on each
mount, which requires full media scanning).

At the same time, UBIFS implements the nice features JFFS2 has -
compression and tolerance to unclean re-boots. Although UBIFS
borrowed basic ideas from JFFS2, it is completely different
file-system.

UBIFS is stable and very close to be production ready. It was
tested on OLPC and N810. The development was done on flash simulator
on a 2-way x86 machine. However, UBIFS needs a good review.

Note, UBIFS works on top of UBI, not on top of bare flash devices.
It delegates crucial things like garbage-collection and bad
eraseblock handling to UBI. One important thing to note is MLC
NAND flashes which tend to have very small eraseblock lifetime -
just few thousand erase-cycles (some have even about 3000 or less).
This makes JFFS2 random wear-leveling algorithm to be not good
enough. In opposite, UBI provides good wear-leveling based on
saved erase-counters.

There is also mkfs.ubifs user-space utility, so it is possible to
prepare UBIFS images. Please, see the URLs given at the end of this
letter.

UBIFS performs quite well - it gives very good write performance
because of write-back (write tests gave us ~100 times faster
performance which is clearly because of the caching) while giving
about the same performance as JFFS2 gives on synchronous operations.
Obviously, it is extremely difficult to compete with JFFS2 on
synchronous operations because it maintains the FS index in RAM,
while UBIFS maintains it on the flash media. However, because of
many tricks and optimization implemented in UBIFS (wandering
and multi-headed journal, write-while-committing, search trees,
etc), it has very good synchronous I/O performance.

UBIFS mount time is considerably faster as well. For example,
In case of OLPC we observed 10-15 seconds faster boot time
comparing to JFFS2 (fast mount, no full media check).

UBIFS is quite complex because it is difficult to maintain
indexing information on the flash media and be fast at the same
time. Please, refer the UBIFS white paper for information
about UBIFS design.

UBIFS documentation and FAQ sections:
http://www.linux-mtd.infradead.org/doc/ubifs.html
http://www.linux-mtd.infradead.org/faq/ubifs.html

UBIFS white-paper:
http://www.linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf

Since UBIFS is closely related to UBI, the UBI documentation/FAQ
is also useful:
http://www.linux-mtd.infradead.org/doc/ubi.html
http://www.linux-mtd.infradead.org/faq/ubi.html

Adrian Hunter
Artem Bityutskiy

The overall diffstat:
fs/Kconfig | 3 +
fs/Makefile | 1 +
fs/fs-writeback.c | 8 +
fs/ubifs/Kconfig | 47 +
fs/ubifs/Kconfig.debug | 159 ++
fs/ubifs/Makefile | 9 +
fs/ubifs/budget.c | 822 +++++++++++
fs/ubifs/build.c | 1351 ++++++++++++++++++
fs/ubifs/commit.c | 708 +++++++++
fs/ubifs/compress.c | 264 ++++
fs/ubifs/debug.c | 1125 +++++++++++++++
fs/ubifs/debug.h | 343 +++++
fs/ubifs/dir.c | 989 +++++++++++++
fs/ubifs/file.c | 790 ++++++++++
fs/ubifs/find.c | 951 +++++++++++++
fs/ubifs/gc.c | 773 ++++++++++
fs/ubifs/io.c | 921 ++++++++++++
fs/ubifs/ioctl.c | 205 +++
fs/ubifs/journal.c | 1230 ++++++++++++++++
fs/ubifs/key.h | 507 +++++++
fs/ubifs/log.c | 769 ++++++++++
fs/ubifs/lprops.c | 1341 +++++++++++++++++
fs/ubifs/lpt.c | 2239 +++++++++++++++++++++++++++++
fs/ubifs/lpt_commit.c | 1628 +++++++++++++++++++++
fs/ubifs/master.c | 415 ++++++
fs/ubifs/misc.h | 267 ++++
fs/ubifs/orphan.c | 952 +++++++++++++
fs/ubifs/recovery.c | 1437 +++++++++++++++++++
fs/ubifs/replay.c | 1006 +++++++++++++
fs/ubifs/sb.c | 581 ++++++++
fs/ubifs/scan.c | 368 +++++
fs/ubifs/shrinker.c | 410 ++++++
fs/ubifs/super.c | 531 +++++++
fs/ubifs/tnc.c | 3483 +++++++++++++++++++++++++++++++++++++++++++++
fs/ubifs/tnc_commit.c | 1088 ++++++++++++++
fs/ubifs/ubifs-media.h | 701 +++++++++
fs/ubifs/ubifs.h | 1519 ++++++++++++++++++++
fs/ubifs/xattr.c | 587 ++++++++
include/linux/writeback.h | 1 +
39 files changed, 30529 insertions(+), 0 deletions(-)

Note, the code is quite large because of complexity and because
of great deal of comments it has. The debugging stuff also
introduces quite a few lines of code.


2008-03-27 13:07:41

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 06/26] UBIFS: add superblock and master node

This patch contains the superblock and master node implementations.
The UBIFS superblock is read-only and contains only static data like
the default compression type. The superblock sits at the fixed
position and may be changed only with user-space tools. The master
node contains dynamic information like the position of the root
indexing node of the UBIFS indexing B-tree, and so on. The master
node is updated out-of-place.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/master.c | 415 ++++++++++++++++++++++++++++++++++++++
fs/ubifs/sb.c | 581 +++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 996 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/master.c b/fs/ubifs/master.c
new file mode 100644
index 0000000..38c40d1
--- /dev/null
+++ b/fs/ubifs/master.c
@@ -0,0 +1,415 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/* This file implements reading and writing the master node */
+
+#include "ubifs.h"
+
+/**
+ * scan_for_master - search the valid master node.
+ * @c: UBIFS file-system description object
+ *
+ * This function scans the master node LEBs and search for the latest master
+ * node. Returns zero in case of success and a negative error code in case of
+ * failure.
+ */
+static int scan_for_master(struct ubifs_info *c)
+{
+ struct ubifs_scan_leb *sleb;
+ struct ubifs_scan_node *snod;
+ int lnum, offs = 0, nodes_cnt;
+
+ lnum = UBIFS_MST_LNUM;
+
+ sleb = ubifs_scan(c, lnum, 0, c->sbuf);
+ if (IS_ERR(sleb))
+ return PTR_ERR(sleb);
+ nodes_cnt = sleb->nodes_cnt;
+ if (nodes_cnt > 0) {
+ snod = list_entry(sleb->nodes.prev, struct ubifs_scan_node,
+ list);
+ if (snod->type != UBIFS_MST_NODE)
+ goto out;
+ memcpy(c->mst_node, snod->node, snod->len);
+ offs = snod->offs;
+ }
+ ubifs_scan_destroy(sleb);
+
+ lnum += 1;
+
+ sleb = ubifs_scan(c, lnum, 0, c->sbuf);
+ if (IS_ERR(sleb))
+ return PTR_ERR(sleb);
+ if (sleb->nodes_cnt != nodes_cnt)
+ goto out;
+ if (!sleb->nodes_cnt)
+ goto out;
+ snod = list_entry(sleb->nodes.prev, struct ubifs_scan_node, list);
+ if (snod->type != UBIFS_MST_NODE)
+ goto out;
+ if (snod->offs != offs)
+ goto out;
+ if (memcmp((void *)c->mst_node + UBIFS_CH_SZ,
+ (void *)snod->node + UBIFS_CH_SZ,
+ UBIFS_MST_NODE_SZ - UBIFS_CH_SZ))
+ goto out;
+ c->mst_offs = offs;
+ ubifs_scan_destroy(sleb);
+ return 0;
+
+out:
+ ubifs_scan_destroy(sleb);
+ return -EINVAL;
+}
+
+/**
+ * validate_master - validate master node.
+ * @c: UBIFS file-system description object
+ *
+ * This function validates data which was read from master node. Returns zero
+ * if the data is all right and %-EINVAL if not.
+ */
+static int validate_master(const struct ubifs_info *c)
+{
+ unsigned long long main_sz;
+ int err;
+
+ if (c->max_sqnum >= SQNUM_WATERMARK) {
+ dbg_err("too large max_sqnum");
+ err = 1;
+ goto out;
+ }
+
+ if (c->cmt_no >= c->max_sqnum) {
+ dbg_err("invalid commit number");
+ err = 2;
+ goto out;
+ }
+
+ if (c->highest_inum >= INUM_WATERMARK) {
+ ubifs_err("too many inodes %lu", c->highest_inum);
+ err = 3;
+ goto out;
+ }
+
+ if (c->lhead_lnum < UBIFS_LOG_LNUM ||
+ c->lhead_lnum >= UBIFS_LOG_LNUM + c->log_lebs ||
+ c->lhead_offs < 0 || c->lhead_offs >= c->leb_size ||
+ c->lhead_offs & (c->min_io_size - 1)) {
+ dbg_err("bad log head reference");
+ err = 4;
+ goto out;
+ }
+
+ if (c->zroot.lnum >= c->leb_cnt || c->zroot.lnum < c->main_first ||
+ c->zroot.offs >= c->leb_size || c->zroot.offs & 7) {
+ dbg_err("bad root indexing node reference");
+ err = 5;
+ goto out;
+ }
+
+ if (c->zroot.len < c->ranges[UBIFS_IDX_NODE].min_len ||
+ c->zroot.len > c->ranges[UBIFS_IDX_NODE].max_len) {
+ dbg_err("bad root indexing node length");
+ err = 6;
+ goto out;
+ }
+
+ if (c->gc_lnum >= c->leb_cnt || c->gc_lnum < c->main_first) {
+ dbg_err("bad GC LEB number");
+ err = 7;
+ goto out;
+ }
+
+ if (c->ihead_lnum >= c->leb_cnt || c->ihead_lnum < c->main_first ||
+ c->ihead_offs % c->min_io_size || c->ihead_offs < 0 ||
+ c->ihead_offs > c->leb_size || c->ihead_offs & 7) {
+ dbg_err("bad indexing head position");
+ err = 8;
+ goto out;
+ }
+
+ main_sz = c->main_lebs * (unsigned long long)c->leb_size;
+ if (c->old_idx_sz & 7 || c->old_idx_sz >= main_sz) {
+ dbg_err("bad index size");
+ err = 9;
+ goto out;
+ }
+
+ if (c->lpt_lnum < c->lpt_first || c->lpt_lnum > c->lpt_last ||
+ c->lpt_offs < 0 || c->lpt_offs + c->nnode_sz > c->leb_size) {
+ dbg_err("bad LPT root position");
+ err = 10;
+ goto out;
+ }
+
+ if (c->nhead_lnum < c->lpt_first || c->nhead_lnum > c->lpt_last ||
+ c->nhead_offs < 0 || c->nhead_offs % c->min_io_size ||
+ c->nhead_offs > c->leb_size) {
+ dbg_err("bad LPT head position");
+ err = 11;
+ goto out;
+ }
+
+ if (c->ltab_lnum < c->lpt_first || c->ltab_lnum > c->lpt_last ||
+ c->ltab_offs < 0 ||
+ c->ltab_offs + c->ltab_sz > c->leb_size) {
+ dbg_err("bad ltab position");
+ err = 12;
+ goto out;
+ }
+
+ if (c->big_lpt && (c->lsave_lnum < c->lpt_first ||
+ c->lsave_lnum > c->lpt_last || c->lsave_offs < 0 ||
+ c->lsave_offs + c->lsave_sz > c->leb_size)) {
+ dbg_err("bad lsave position");
+ err = 13;
+ goto out;
+ }
+
+ if (c->lscan_lnum < c->main_first || c->lscan_lnum >= c->leb_cnt) {
+ dbg_err("bad lscan_lnum");
+ err = 14;
+ goto out;
+ }
+
+ if (c->lst.empty_lebs < 0 || c->lst.empty_lebs > c->main_lebs - 2) {
+ dbg_err("bad empty LEB count");
+ err = 15;
+ goto out;
+ }
+
+ if (c->lst.idx_lebs < 0 || c->lst.idx_lebs > c->main_lebs - 1) {
+ dbg_err("bad index LEB count");
+ err = 16;
+ goto out;
+ }
+
+ if (c->lst.total_free < 0 || c->lst.total_free > main_sz ||
+ c->lst.total_free & 7) {
+ dbg_err("bad total free");
+ err = 17;
+ goto out;
+ }
+
+ if (c->lst.total_dirty < 0 || (c->lst.total_dirty & 7)) {
+ dbg_err("bad total dirty");
+ err = 18;
+ goto out;
+ }
+
+ if (c->lst.total_used < 0 || (c->lst.total_used & 7)) {
+ dbg_err("bad total used");
+ err = 19;
+ goto out;
+ }
+
+ if (c->lst.total_free + c->lst.total_dirty +
+ c->lst.total_used > main_sz) {
+ dbg_err("bad total free + total dirty + total used");
+ dbg_err("total free %lld, total dirty %lld, total used %lld, "
+ "sum %lld, main_sz %lld", c->lst.total_free,
+ c->lst.total_dirty, c->lst.total_used,
+ c->lst.total_free + c->lst.total_dirty +
+ c->lst.total_used, main_sz);
+ err = 20;
+ goto out;
+ }
+
+ if (c->lst.total_dead + c->lst.total_dark +
+ c->lst.total_used + c->old_idx_sz > main_sz) {
+ dbg_err("bad total dead + total dark + total used + old idx");
+ err = 21;
+ goto out;
+ }
+
+ if (c->lst.total_dead < 0 ||
+ c->lst.total_dead > c->lst.total_free + c->lst.total_dirty ||
+ c->lst.total_dead & 7) {
+ dbg_err("bad total dead space");
+ err = 22;
+ goto out;
+ }
+
+ if (c->lst.total_dark < 0 ||
+ c->lst.total_dark > c->lst.total_free + c->lst.total_dirty ||
+ c->lst.total_dark & 7) {
+ dbg_err("bad total dark space");
+ err = 23;
+ goto out;
+ }
+
+ return 0;
+
+out:
+ ubifs_err("bad master node at offset %d error %d", c->mst_offs, err);
+ dbg_dump_node(c, c->mst_node);
+ return -EINVAL;
+}
+
+/**
+ * ubifs_read_master - read master node.
+ * @c: UBIFS file-system description object
+ *
+ * This function finds and reads the master node during file-system mount. If
+ * the flash is empty, it creates default master node as well. Returns zero in
+ * case of success and a negative error code in case of failure.
+ */
+int ubifs_read_master(struct ubifs_info *c)
+{
+ int err, old_leb_cnt;
+
+ c->mst_node = kzalloc(c->mst_node_alsz, GFP_KERNEL);
+ if (!c->mst_node)
+ return -ENOMEM;
+
+ err = scan_for_master(c);
+ if (err) {
+ err = ubifs_recover_master_node(c);
+ if (err)
+ /*
+ * Note, we do not free 'c->mst_node' here because the
+ * unmount routine will take care of this.
+ */
+ return err;
+ }
+
+ /* Make sure that the recovery flag is clear */
+ c->mst_node->flags &= cpu_to_le32(~UBIFS_MST_RCVRY);
+
+ c->max_sqnum = le64_to_cpu(c->mst_node->ch.sqnum);
+ c->highest_inum = le64_to_cpu(c->mst_node->highest_inum);
+ c->cmt_no = le64_to_cpu(c->mst_node->cmt_no);
+ c->zroot.lnum = le32_to_cpu(c->mst_node->root_lnum);
+ c->zroot.offs = le32_to_cpu(c->mst_node->root_offs);
+ c->zroot.len = le32_to_cpu(c->mst_node->root_len);
+ c->lhead_lnum = le32_to_cpu(c->mst_node->log_lnum);
+ c->gc_lnum = le32_to_cpu(c->mst_node->gc_lnum);
+ c->ihead_lnum = le32_to_cpu(c->mst_node->ihead_lnum);
+ c->ihead_offs = le32_to_cpu(c->mst_node->ihead_offs);
+ c->old_idx_sz = le64_to_cpu(c->mst_node->index_size);
+ c->lpt_lnum = le32_to_cpu(c->mst_node->lpt_lnum);
+ c->lpt_offs = le32_to_cpu(c->mst_node->lpt_offs);
+ c->nhead_lnum = le32_to_cpu(c->mst_node->nhead_lnum);
+ c->nhead_offs = le32_to_cpu(c->mst_node->nhead_offs);
+ c->ltab_lnum = le32_to_cpu(c->mst_node->ltab_lnum);
+ c->ltab_offs = le32_to_cpu(c->mst_node->ltab_offs);
+ c->lsave_lnum = le32_to_cpu(c->mst_node->lsave_lnum);
+ c->lsave_offs = le32_to_cpu(c->mst_node->lsave_offs);
+ c->lscan_lnum = le32_to_cpu(c->mst_node->lscan_lnum);
+ c->lst.empty_lebs = le32_to_cpu(c->mst_node->empty_lebs);
+ c->lst.idx_lebs = le32_to_cpu(c->mst_node->idx_lebs);
+ old_leb_cnt = le32_to_cpu(c->mst_node->leb_cnt);
+ c->lst.total_free = le64_to_cpu(c->mst_node->total_free);
+ c->lst.total_dirty = le64_to_cpu(c->mst_node->total_dirty);
+ c->lst.total_used = le64_to_cpu(c->mst_node->total_used);
+ c->lst.total_dead = le64_to_cpu(c->mst_node->total_dead);
+ c->lst.total_dark = le64_to_cpu(c->mst_node->total_dark);
+
+ c->calc_idx_sz = c->old_idx_sz;
+
+ if (c->mst_node->flags & cpu_to_le32(UBIFS_MST_NO_ORPHS))
+ c->no_orphs = 1;
+
+ if (old_leb_cnt != c->leb_cnt) {
+ /* The file system has been resized */
+ int growth = c->leb_cnt - old_leb_cnt;
+
+ if (c->leb_cnt < old_leb_cnt ||
+ c->leb_cnt < UBIFS_MIN_LEB_CNT) {
+ ubifs_err("bad leb_cnt on master node");
+ dbg_dump_node(c, c->mst_node);
+ return -EINVAL;
+ }
+
+ dbg_mnt("Auto resizing (master) from %d LEBs to %d LEBs",
+ old_leb_cnt, c->leb_cnt);
+ c->lst.empty_lebs += growth;
+ c->lst.total_free += growth * (long long)c->leb_size;
+ c->lst.total_dark += growth * (long long)c->dark_wm;
+
+ /*
+ * Reflect changes back onto the master node. N.B. the master
+ * node gets written immediately whenever mounting (or
+ * remounting) in read-write mode, so we do not need to write it
+ * here.
+ */
+ c->mst_node->leb_cnt = cpu_to_le32(c->leb_cnt);
+ c->mst_node->empty_lebs = cpu_to_le32(c->lst.empty_lebs);
+ c->mst_node->total_free = cpu_to_le64(c->lst.total_free);
+ c->mst_node->total_dark = cpu_to_le64(c->lst.total_dark);
+ }
+
+ err = validate_master(c);
+ if (err)
+ return err;
+
+ err = dbg_old_index_check_init(c, &c->zroot);
+
+ return err;
+}
+
+/**
+ * ubifs_write_master - write master node.
+ * @c: UBIFS file-system description object
+ *
+ * This function writes the master node. The caller has to take the
+ * @c->mst_mutex lock before calling this function. Returns zero in case of
+ * success and a negative error code in case of failure. The master node is
+ * written twice to enable recovery.
+ */
+int ubifs_write_master(struct ubifs_info *c)
+{
+ int err, lnum, offs, len;
+
+ if (c->ro_media)
+ return -EINVAL;
+
+ lnum = UBIFS_MST_LNUM;
+ offs = c->mst_offs + c->mst_node_alsz;
+ len = UBIFS_MST_NODE_SZ;
+
+ if (offs + UBIFS_MST_NODE_SZ > c->leb_size) {
+ err = ubifs_leb_unmap(c, lnum);
+ if (err)
+ return err;
+ offs = 0;
+ }
+
+ c->mst_offs = offs;
+ c->mst_node->highest_inum = cpu_to_le64(c->highest_inum);
+
+ err = ubifs_write_node(c, c->mst_node, len, lnum, offs, UBI_SHORTTERM);
+ if (err)
+ return err;
+
+ lnum += 1;
+
+ if (offs == 0) {
+ err = ubifs_leb_unmap(c, lnum);
+ if (err)
+ return err;
+ }
+ err = ubifs_write_node(c, c->mst_node, len, lnum, offs, UBI_SHORTTERM);
+
+ return err;
+}
diff --git a/fs/ubifs/sb.c b/fs/ubifs/sb.c
new file mode 100644
index 0000000..e9f1045
--- /dev/null
+++ b/fs/ubifs/sb.c
@@ -0,0 +1,581 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file implements UBIFS superblock. The superblock is stored at the first
+ * LEB of the volume and is never changed by UBIFS. Only user-space tools may
+ * change it. The superblock node mostly contains geometry information.
+ */
+
+#include <asm/div64.h>
+#include "ubifs.h"
+
+/*
+ * Default journal size in logical eraseblocks as a percent of total
+ * flash size.
+ */
+#define DEFAULT_JRN_PERCENT 5
+
+/* Default maximum journal size in bytes */
+#define DEFAULT_MAX_JRN (32*1024*1024)
+
+/* Default indexing tree fanout */
+#define DEFAULT_FANOUT 8
+
+/* Default number of LEBs for orphan information */
+#ifdef CONFIG_UBIFS_FS_DEBUG
+#define DEFAULT_ORPHAN_LEBS 2 /* 2 is better for testing */
+#else
+#define DEFAULT_ORPHAN_LEBS 1
+#endif
+
+/* Default number of journal heads */
+#define DEFAULT_JHEADS_CNT 1
+
+/* Default positions of different LEBs in the main area */
+#define DEFAULT_IDX_LEB 0
+#define DEFAULT_DATA_LEB 1
+#define DEFAULT_GC_LEB 2
+
+/* Default number of LEB numbers in LPT's save table */
+#define DEFAULT_LSAVE_CNT 256
+
+/* Default reserved pool size as a percent of maximum free space */
+#define DEFAULT_RP_PERCENT 5
+
+/* The default maximum size of reserved pool in bytes */
+#define DEFAULT_MAX_RP_SIZE (5*1024*1024)
+
+/* Default UBIFS compressor */
+#define DEFAULT_COMPRESSOR UBIFS_COMPR_LZO
+
+/**
+ * create_default_filesystem - format empty UBI volume.
+ * @c: UBIFS file-system description object
+ *
+ * This function creates default empty file-system. Returns zero in case of
+ * success and a negative error code in case of failure.
+ */
+static int create_default_filesystem(struct ubifs_info *c)
+{
+ struct ubifs_sb_node *sup;
+ struct ubifs_mst_node *mst;
+ struct ubifs_idx_node *idx;
+ struct ubifs_branch *br;
+ struct ubifs_ino_node *ino;
+ struct ubifs_cs_node *cs;
+ union ubifs_key key;
+ int err, tmp, jrn_lebs, log_lebs, max_buds, main_lebs, main_first;
+ int lpt_lebs, lpt_first, orph_lebs, big_lpt, ino_waste, sup_flags = 0;
+ long long tmp64, main_bytes;
+
+ /* Some functions called from here depend on the @c->key_len filed */
+ c->key_len = UBIFS_SK_LEN;
+
+ /*
+ * First of all, we have to calculate default file-system geometry -
+ * log size, journal size, etc.
+ */
+ c->max_leb_cnt = c->leb_cnt;
+ if (c->leb_cnt < 0x7FFFFFFF / DEFAULT_JRN_PERCENT)
+ /* We can first multiply then divide and have no overflow */
+ jrn_lebs = c->leb_cnt * DEFAULT_JRN_PERCENT / 100;
+ else
+ jrn_lebs = (c->leb_cnt / 100) * DEFAULT_JRN_PERCENT;
+
+ if (jrn_lebs < UBIFS_MIN_JRN_LEBS)
+ jrn_lebs = UBIFS_MIN_JRN_LEBS;
+ if (jrn_lebs * c->leb_size > DEFAULT_MAX_JRN)
+ jrn_lebs = DEFAULT_MAX_JRN / c->leb_size;
+
+ /*
+ * The log should be large enough to fit reference nodes for all bud
+ * LEBs. Because buds do not have to start from the beginning of LEBs
+ * (half of the LEB may contain committed data), the log should
+ * generally be larger, make it twice as large.
+ */
+ tmp = 2 * (c->ref_node_alsz * jrn_lebs) + c->leb_size - 1;
+ log_lebs = tmp / c->leb_size;
+ /* Plus one LEB reserved for commit */
+ log_lebs += 1;
+ /* And some extra space to allow writes while committing */
+ log_lebs += 1;
+
+ max_buds = jrn_lebs - log_lebs;
+ if (max_buds < UBIFS_MIN_BUD_LEBS)
+ max_buds = UBIFS_MIN_BUD_LEBS;
+
+ /*
+ * Orphan nodes are stored in a separate area. One node can store a lot
+ * of orphan inode numbers, but when new orphan comes we just add a new
+ * orphan node. At some point the nodes are consolidated into one
+ * orphan node.
+ */
+ orph_lebs = DEFAULT_ORPHAN_LEBS;
+
+ main_lebs = c->leb_cnt - UBIFS_SB_LEBS - UBIFS_MST_LEBS - log_lebs;
+ main_lebs -= orph_lebs;
+
+ lpt_first = UBIFS_LOG_LNUM + log_lebs;
+ c->lsave_cnt = DEFAULT_LSAVE_CNT;
+ err = ubifs_create_dflt_lpt(c, &main_lebs, lpt_first, &lpt_lebs,
+ &big_lpt);
+ if (err)
+ return err;
+
+ dbg_gen("LEB Properties Tree created (LEBs %d-%d)", lpt_first,
+ lpt_first + lpt_lebs - 1);
+
+ main_first = c->leb_cnt - main_lebs;
+
+ /* Create default superblock */
+ tmp = ALIGN(UBIFS_SB_NODE_SZ, c->min_io_size);
+ sup = kzalloc(tmp, GFP_KERNEL);
+ if (!sup)
+ return -ENOMEM;
+
+ tmp64 = (long long)max_buds * c->leb_size;
+ if (big_lpt)
+ sup_flags |= UBIFS_FLG_BIGLPT;
+
+ sup->ch.node_type = UBIFS_SB_NODE;
+ sup->key_hash = c->key_hash_type;
+ sup->flags = cpu_to_le32(sup_flags);
+ sup->min_io_size = cpu_to_le32(c->min_io_size);
+ sup->leb_size = cpu_to_le32(c->leb_size);
+ sup->leb_cnt = cpu_to_le32(c->leb_cnt);
+ sup->max_leb_cnt = cpu_to_le32(c->max_leb_cnt);
+ sup->max_bud_bytes = cpu_to_le64(tmp64);
+ sup->log_lebs = cpu_to_le32(log_lebs);
+ sup->lpt_lebs = cpu_to_le32(lpt_lebs);
+ sup->orph_lebs = cpu_to_le32(orph_lebs);
+ sup->jhead_cnt = cpu_to_le32(DEFAULT_JHEADS_CNT);
+ sup->fanout = cpu_to_le32(DEFAULT_FANOUT);
+ sup->lsave_cnt = cpu_to_le32(c->lsave_cnt);
+ sup->fmt_vers = cpu_to_le32(UBIFS_FORMAT_VERSION);
+ sup->default_compr = cpu_to_le16(DEFAULT_COMPRESSOR);
+
+ main_bytes = (long long)main_lebs * c->leb_size;
+ tmp64 = main_bytes * DEFAULT_RP_PERCENT;
+ do_div(tmp64, 100);
+ if (tmp64 > DEFAULT_MAX_RP_SIZE)
+ tmp64 = DEFAULT_MAX_RP_SIZE;
+ sup->rp_size = cpu_to_le64(tmp64);
+
+ err = ubifs_write_node(c, sup, UBIFS_SB_NODE_SZ, 0, 0, UBI_LONGTERM);
+ kfree(sup);
+ if (err)
+ return err;
+
+ dbg_gen("default superblock created at LEB 0:0");
+
+ /* Create default master node */
+ mst = kzalloc(c->mst_node_alsz, GFP_KERNEL);
+ if (!mst)
+ return -ENOMEM;
+
+ mst->ch.node_type = UBIFS_MST_NODE;
+ mst->log_lnum = cpu_to_le32(UBIFS_LOG_LNUM);
+ mst->highest_inum = cpu_to_le64(UBIFS_FIRST_INO);
+ mst->cmt_no = cpu_to_le64(0);
+ mst->root_lnum = cpu_to_le32(main_first + DEFAULT_IDX_LEB);
+ mst->root_offs = cpu_to_le32(0);
+ tmp = ubifs_idx_node_sz(c, 1);
+ mst->root_len = cpu_to_le32(tmp);
+ mst->gc_lnum = cpu_to_le32(main_first + DEFAULT_GC_LEB);
+ mst->ihead_lnum = cpu_to_le32(main_first + DEFAULT_IDX_LEB);
+ mst->ihead_offs = cpu_to_le32(ALIGN(tmp, c->min_io_size));
+ mst->index_size = cpu_to_le64(ALIGN(tmp, 8));
+ mst->lpt_lnum = cpu_to_le32(c->lpt_lnum);
+ mst->lpt_offs = cpu_to_le32(c->lpt_offs);
+ mst->nhead_lnum = cpu_to_le32(c->nhead_lnum);
+ mst->nhead_offs = cpu_to_le32(c->nhead_offs);
+ mst->ltab_lnum = cpu_to_le32(c->ltab_lnum);
+ mst->ltab_offs = cpu_to_le32(c->ltab_offs);
+ mst->lsave_lnum = cpu_to_le32(c->lsave_lnum);
+ mst->lsave_offs = cpu_to_le32(c->lsave_offs);
+ mst->lscan_lnum = cpu_to_le32(main_first);
+ mst->empty_lebs = cpu_to_le32(main_lebs - 2);
+ mst->idx_lebs = cpu_to_le32(1);
+ mst->leb_cnt = cpu_to_le32(c->leb_cnt);
+
+ /* Calculate lprops statistics */
+ tmp64 = main_bytes;
+ tmp64 -= ALIGN(ubifs_idx_node_sz(c, 1), c->min_io_size);
+ tmp64 -= ALIGN(UBIFS_INO_NODE_SZ, c->min_io_size);
+ mst->total_free = cpu_to_le64(tmp64);
+
+ tmp64 = ALIGN(ubifs_idx_node_sz(c, 1), c->min_io_size);
+ ino_waste = ALIGN(UBIFS_INO_NODE_SZ, c->min_io_size) -
+ UBIFS_INO_NODE_SZ;
+ tmp64 += ino_waste;
+ tmp64 -= ALIGN(ubifs_idx_node_sz(c, 1), 8);
+ mst->total_dirty = cpu_to_le64(tmp64);
+
+ /* The indexing LEB does not contribute to dark space */
+ tmp64 = (c->main_lebs - 1) * c->dark_wm;
+ mst->total_dark = cpu_to_le64(tmp64);
+
+ mst->total_used = cpu_to_le64(UBIFS_INO_NODE_SZ);
+
+ err = ubifs_write_node(c, mst, UBIFS_MST_NODE_SZ, UBIFS_MST_LNUM, 0,
+ UBI_UNKNOWN);
+ if (err) {
+ kfree(mst);
+ return err;
+ }
+ err = ubifs_write_node(c, mst, UBIFS_MST_NODE_SZ, UBIFS_MST_LNUM + 1, 0,
+ UBI_UNKNOWN);
+ kfree(mst);
+ if (err)
+ return err;
+
+ dbg_gen("default master node created at LEB %d:0", UBIFS_MST_LNUM);
+
+ /* Create the root indexing node */
+ tmp = ubifs_idx_node_sz(c, 1);
+ idx = kzalloc(ALIGN(tmp, c->min_io_size), GFP_KERNEL);
+ if (!idx)
+ return -ENOMEM;
+
+ c->key_fmt = UBIFS_SIMPLE_KEY_FMT;
+ c->key_hash = key_r5_hash;
+
+ idx->ch.node_type = UBIFS_IDX_NODE;
+ idx->child_cnt = cpu_to_le16(1);
+ ino_key_init(c, &key, UBIFS_ROOT_INO);
+ br = ubifs_idx_branch(c, idx, 0);
+ key_write_idx(c, &key, &br->key);
+ br->lnum = cpu_to_le32(main_first + DEFAULT_DATA_LEB);
+ br->len = cpu_to_le32(UBIFS_INO_NODE_SZ);
+ err = ubifs_write_node(c, idx, tmp, main_first + DEFAULT_IDX_LEB, 0,
+ UBI_UNKNOWN);
+ kfree(idx);
+ if (err)
+ return err;
+
+ dbg_gen("default root indexing node created LEB %d:0",
+ main_first + DEFAULT_IDX_LEB);
+
+ /* Create default root inode */
+ tmp = ALIGN(UBIFS_INO_NODE_SZ, c->min_io_size);
+ ino = kzalloc(tmp, GFP_KERNEL);
+ if (!ino)
+ return -ENOMEM;
+
+ ino_key_init_flash(c, &ino->key, UBIFS_ROOT_INO);
+ ino->ch.node_type = UBIFS_INO_NODE;
+ ino->creat_sqnum = cpu_to_le64(++c->max_sqnum);
+ ino->nlink = cpu_to_le32(2);
+ ino->atime = ino->ctime = ino->mtime =
+ cpu_to_le32(CURRENT_TIME_SEC.tv_sec);
+ ino->mode = cpu_to_le32(S_IFDIR | S_IRUGO | S_IWUSR | S_IXUGO);
+
+ /* Set compression enabled by default */
+ ino->flags = cpu_to_le32(UBIFS_COMPR_FL);
+
+ err = ubifs_write_node(c, ino, UBIFS_INO_NODE_SZ,
+ main_first + DEFAULT_DATA_LEB, 0,
+ UBI_UNKNOWN);
+ kfree(ino);
+ if (err)
+ return err;
+
+ dbg_gen("root inode created at LEB %d:0",
+ main_first + DEFAULT_DATA_LEB);
+
+ /*
+ * The first node in the log has to be the commit start node. This is
+ * always the case during normal file-system operation. Write a fake
+ * commit start node to the log.
+ */
+ tmp = ALIGN(UBIFS_CS_NODE_SZ, c->min_io_size);
+ cs = kzalloc(tmp, GFP_KERNEL);
+ if (!cs)
+ return -ENOMEM;
+
+ cs->ch.node_type = UBIFS_CS_NODE;
+ err = ubifs_write_node(c, cs, UBIFS_CS_NODE_SZ, UBIFS_LOG_LNUM,
+ 0, UBI_UNKNOWN);
+ kfree(cs);
+
+ ubifs_msg("default file-system created");
+ return 0;
+}
+
+/**
+ * validate_sb - validate superblock node.
+ * @c: UBIFS file-system description object
+ * @sup: superblock node
+ *
+ * This function validates superblock node @sup. Since most of data was read
+ * from the superblock and stored in @c, the function validates fields in @c
+ * instead. Returns zero in case of success and %-EINVAL in case of validation
+ * failure.
+ */
+static int validate_sb(struct ubifs_info *c, struct ubifs_sb_node *sup)
+{
+ long long max_bytes;
+
+ if (!c->key_hash)
+ goto failed;
+
+ if (sup->key_fmt != UBIFS_SIMPLE_KEY_FMT)
+ goto failed;
+
+ if (le32_to_cpu(sup->min_io_size) != c->min_io_size) {
+ ubifs_err("min. I/O unit mismatch: %d in superblock, %d real",
+ le32_to_cpu(sup->min_io_size), c->min_io_size);
+ goto failed;
+ }
+
+ if (le32_to_cpu(sup->leb_size) != c->leb_size) {
+ ubifs_err("LEB size mismatch: %d in superblock, %d real",
+ le32_to_cpu(sup->leb_size), c->leb_size);
+ goto failed;
+ }
+
+ if (c->leb_cnt < UBIFS_MIN_LEB_CNT || c->leb_cnt > c->vi.size) {
+ ubifs_err("bad LEB count: %d in superblock, %d on UBI volume, "
+ "%d minimum required", c->leb_cnt, c->vi.size,
+ UBIFS_MIN_LEB_CNT);
+ goto failed;
+ }
+
+ if (c->max_leb_cnt < c->leb_cnt) {
+ ubifs_err("max. LEB count %d less than LEB count %d",
+ c->max_leb_cnt, c->leb_cnt);
+ goto failed;
+ }
+
+ if (c->log_lebs < UBIFS_MIN_LOG_LEBS ||
+ c->lpt_lebs < UBIFS_MIN_LPT_LEBS ||
+ c->orph_lebs < UBIFS_MIN_ORPH_LEBS ||
+ c->main_lebs < UBIFS_MIN_MAIN_LEBS)
+ goto failed;
+
+ if (c->main_lebs < UBIFS_MIN_MAIN_LEBS) {
+ dbg_err("bad main_lebs");
+ goto failed;
+ }
+
+ if (c->max_bud_bytes < (long long)c->leb_size * UBIFS_MIN_BUD_LEBS ||
+ c->max_bud_bytes > (long long)c->leb_size * c->main_lebs) {
+ dbg_err("bad max_bud_bytes");
+ goto failed;
+ }
+
+ if (c->jhead_cnt < NONDATA_JHEADS_CNT + 1 ||
+ c->jhead_cnt > NONDATA_JHEADS_CNT + UBIFS_MAX_JHEADS) {
+ dbg_err("bad jhead_cnt");
+ goto failed;
+ }
+
+ if (c->fanout < UBIFS_MIN_FANOUT ||
+ ubifs_idx_node_sz(c, c->fanout) > c->leb_size) {
+ dbg_err("bad fanout");
+ goto failed;
+ }
+
+ if (c->lsave_cnt < 0 || c->lsave_cnt > c->max_leb_cnt - UBIFS_SB_LEBS -
+ UBIFS_MST_LEBS - c->log_lebs - c->lpt_lebs - c->orph_lebs) {
+ dbg_err("bad lsave_cnt");
+ goto failed;
+ }
+
+ if (UBIFS_SB_LEBS + UBIFS_MST_LEBS + c->log_lebs + c->lpt_lebs +
+ c->orph_lebs + c->main_lebs != c->leb_cnt) {
+ dbg_err("LEBs don't add up");
+ goto failed;
+ }
+
+ if (c->default_compr < 0 || c->default_compr >= UBIFS_COMPR_TYPES_CNT) {
+ dbg_err("bad compression type");
+ goto failed;
+ }
+
+ max_bytes = c->main_lebs * (long long)c->leb_size;
+ if (c->rp_size < 0 || max_bytes < c->rp_size) {
+ dbg_err("bad reserved pool size, must be >= 0 and <= %lld\n",
+ max_bytes);
+ goto failed;
+ }
+
+ return 0;
+
+failed:
+ ubifs_err("bad superblock");
+ dbg_dump_node(c, sup);
+ return -EINVAL;
+}
+
+/**
+ * ubifs_read_sb_node - read superblock node.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns a pointer to the superblock node or a negative error
+ * code.
+ */
+struct ubifs_sb_node *ubifs_read_sb_node(struct ubifs_info *c)
+{
+ struct ubifs_sb_node *sup;
+ int err;
+
+ sup = kmalloc(ALIGN(UBIFS_SB_NODE_SZ, c->min_io_size), GFP_NOFS);
+ if (!sup)
+ return ERR_PTR(-ENOMEM);
+
+ err = ubifs_read_node(c, sup, UBIFS_SB_NODE, UBIFS_SB_NODE_SZ,
+ UBIFS_SB_LNUM, 0);
+ if (err) {
+ kfree(sup);
+ return ERR_PTR(err);
+ }
+
+ return sup;
+}
+
+/**
+ * ubifs_write_sb_node - write superblock node.
+ * @c: UBIFS file-system description object
+ * @sup: superblock node read with 'ubifs_read_sb_node()'
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_write_sb_node(struct ubifs_info *c, struct ubifs_sb_node *sup)
+{
+ int len = ALIGN(UBIFS_SB_NODE_SZ, c->min_io_size);
+
+ ubifs_prepare_node(c, sup, UBIFS_SB_NODE_SZ, 1);
+ return ubi_leb_change(c->ubi, UBIFS_SB_LNUM, sup, len, UBI_LONGTERM);
+}
+
+/**
+ * ubifs_read_superblock - read superblock.
+ * @c: UBIFS file-system description object
+ *
+ * This function finds, reads and checks the superblock. If an empty UBI volume
+ * is being mounted, this function creates default superblock. Returns zero in
+ * case of success, and a negative error code in case of failure.
+ */
+int ubifs_read_superblock(struct ubifs_info *c)
+{
+ int err, sup_flags;
+ struct ubifs_sb_node *sup;
+
+ if (c->empty) {
+ err = create_default_filesystem(c);
+ if (err)
+ return err;
+ }
+
+ sup = ubifs_read_sb_node(c);
+ if (IS_ERR(sup))
+ return PTR_ERR(sup);
+
+ /*
+ * The software supports all previous versions but not future versions,
+ * due to the unavailability of time-travelling equipment.
+ */
+ c->fmt_vers = le32_to_cpu(sup->fmt_vers);
+ if (c->fmt_vers > UBIFS_FORMAT_VERSION) {
+ ubifs_err("on-flash format version is %d, but software only "
+ "supports up to version %d", c->fmt_vers,
+ UBIFS_FORMAT_VERSION);
+ err = -EINVAL;
+ goto out;
+ }
+
+ switch (sup->key_hash) {
+ case UBIFS_KEY_HASH_R5:
+ c->key_hash = key_r5_hash;
+ c->key_hash_type = UBIFS_KEY_HASH_R5;
+ break;
+
+ case UBIFS_KEY_HASH_TEST:
+ c->key_hash = key_test_hash;
+ c->key_hash_type = UBIFS_KEY_HASH_TEST;
+ break;
+ };
+
+ c->key_fmt = sup->key_fmt;
+
+ switch (c->key_fmt) {
+ case UBIFS_SIMPLE_KEY_FMT:
+ c->key_len = UBIFS_SK_LEN;
+ break;
+ default:
+ ubifs_err("unsupported key format");
+ err = -EINVAL;
+ goto out;
+ }
+
+ c->leb_cnt = le32_to_cpu(sup->leb_cnt);
+ c->max_leb_cnt = le32_to_cpu(sup->max_leb_cnt);
+ c->max_bud_bytes = le64_to_cpu(sup->max_bud_bytes);
+ c->log_lebs = le32_to_cpu(sup->log_lebs);
+ c->lpt_lebs = le32_to_cpu(sup->lpt_lebs);
+ c->orph_lebs = le32_to_cpu(sup->orph_lebs);
+ c->jhead_cnt = le32_to_cpu(sup->jhead_cnt) + NONDATA_JHEADS_CNT;
+ c->fanout = le32_to_cpu(sup->fanout);
+ c->lsave_cnt = le32_to_cpu(sup->lsave_cnt);
+ c->default_compr = le16_to_cpu(sup->default_compr);
+ c->rp_size = le64_to_cpu(sup->rp_size);
+ c->rp_uid = le32_to_cpu(sup->rp_uid);
+ c->rp_gid = le32_to_cpu(sup->rp_gid);
+ sup_flags = le32_to_cpu(sup->flags);
+
+ c->big_lpt = !!(sup_flags & UBIFS_FLG_BIGLPT);
+
+ /* Automatically increase file system size to the maximum size */
+ c->old_leb_cnt = c->leb_cnt;
+ if (c->leb_cnt < c->vi.size && c->leb_cnt < c->max_leb_cnt) {
+ c->leb_cnt = min_t(int, c->max_leb_cnt, c->vi.size);
+ if (c->vfs_sb->s_flags & MS_RDONLY)
+ dbg_mnt("Auto resizing (ro) from %d LEBs to %d LEBs",
+ c->old_leb_cnt, c->leb_cnt);
+ else {
+ dbg_mnt("Auto resizing (sb) from %d LEBs to %d LEBs",
+ c->old_leb_cnt, c->leb_cnt);
+ sup->leb_cnt = cpu_to_le32(c->leb_cnt);
+ err = ubifs_write_sb_node(c, sup);
+ if (err)
+ goto out;
+ c->old_leb_cnt = c->leb_cnt;
+ }
+ }
+
+ c->log_bytes = (long long)c->log_lebs * c->leb_size;
+ c->log_last = UBIFS_LOG_LNUM + c->log_lebs - 1;
+ c->lpt_first = UBIFS_LOG_LNUM + c->log_lebs;
+ c->lpt_last = c->lpt_first + c->lpt_lebs - 1;
+ c->orph_first = c->lpt_last + 1;
+ c->orph_last = c->orph_first + c->orph_lebs - 1;
+ c->main_lebs = c->leb_cnt - UBIFS_SB_LEBS - UBIFS_MST_LEBS;
+ c->main_lebs -= c->log_lebs + c->lpt_lebs + c->orph_lebs;
+ c->main_first = c->leb_cnt - c->main_lebs;
+
+ err = validate_sb(c, sup);
+out:
+ kfree(sup);
+ return err;
+}
--
1.5.4.1

2008-03-27 13:08:11

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 19/26] UBIFS: add Garbage Collector

This is one of the most important parts of UBIFS. Since all updates
are out-of-place, we need to do garbage collection from time to time,
which is implemented in this file. The UBIFS GC does not do much -
it just move clean data to the journal and erases the cleaned-up
eraseblock. The main trick is done in TNC commit which guarantees
that the commit operation is always possible, even if there is no
clean space, in which case it may use in-place updates provided
by UBI.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/gc.c | 773 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 773 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/gc.c b/fs/ubifs/gc.c
new file mode 100644
index 0000000..7b43655
--- /dev/null
+++ b/fs/ubifs/gc.c
@@ -0,0 +1,773 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements garbage collection. The procedure for garbage collection
+ * is different depending on whether a LEB as an index LEB (contains index
+ * nodes) or not. For non-index LEBs, garbage collection finds a LEB which
+ * contains a lot of dirty space (obsolete nodes), and copies the non-obsolete
+ * nodes to the journal, at which point the garbage-collected LEB is free to be
+ * reused. For index LEBs, garbage collection marks the non-obsolete index nodes
+ * dirty in the TNC, and after the next commit, the garbage-collected LEB is
+ * to be reused. Garbage collection will cause the number of dirty index nodes
+ * to grow, however sufficient space is reserved for the index to ensure the
+ * commit will never run out of space.
+ */
+
+#include <linux/pagemap.h>
+#include "ubifs.h"
+
+/*
+ * GC tries to optimize the way it fit nodes to available space, and it sorts
+ * nodes a little. The below constants are watermarks which define "large",
+ * "medium", and "small" nodes.
+ */
+#define MEDIUM_NODE_WM (UBIFS_BLOCK_SIZE / 4)
+#define SMALL_NODE_WM UBIFS_MAX_DENT_NODE_SZ
+
+/*
+ * GC may need to move more then one LEB to make progress. The below constants
+ * define "soft" and "hard" limits on the number of LEBs the garbage collector
+ * may move.
+ */
+#define SOFT_LEBS_LIMIT 4
+#define HARD_LEBS_LIMIT 32
+
+/*
+ * Return codes used by the garbage collector.
+ * @LEB_FREED: the logical eraseblock was freed and is ready to use
+ * @LEB_FREED_IDX: indexing LEB was freed and can be used only after the commit
+ * @LEB_RETAINED: the logical eraseblock was freed and retained for GC purposes
+ */
+enum {
+ LEB_FREED,
+ LEB_FREED_IDX,
+ LEB_RETAINED,
+};
+
+/**
+ * switch_gc_head - switch the garbage collection journal head.
+ * @c: UBIFS file-system description object
+ * @buf: buffer to write
+ * @len: length of the buffer to write
+ * @lnum: LEB number written is returned here
+ * @offs: offset written is returned here
+ *
+ * This function switch the GC head to the next LEB which is reserved in
+ * @c->gc_lnum. Returns %0 in case of success, %-EAGAIN if commit is required,
+ * and other negative error code in case of failures.
+ */
+static int switch_gc_head(struct ubifs_info *c)
+{
+ int err, gc_lnum = c->gc_lnum;
+ struct ubifs_wbuf *wbuf = &c->jheads[GCHD].wbuf;
+
+ ubifs_assert(gc_lnum != -1);
+ dbg_gc("switch GC head from LEB %d:%d to LEB %d (waste %d bytes)",
+ wbuf->lnum, wbuf->offs + wbuf->used, gc_lnum,
+ c->leb_size - wbuf->offs - wbuf->used);
+
+ err = ubifs_wbuf_sync_nolock(wbuf);
+ if (err)
+ return err;
+
+ /*
+ * The GC write-buffer was synchronized, we may safely unmap
+ * 'c->gc_lnum'.
+ */
+ err = ubifs_leb_unmap(c, gc_lnum);
+ if (err)
+ return err;
+
+ err = ubifs_add_bud_to_log(c, GCHD, gc_lnum, 0);
+ if (err)
+ return err;
+
+ c->gc_lnum = -1;
+ err = ubifs_wbuf_seek_nolock(wbuf, gc_lnum, 0, UBI_LONGTERM);
+ return err;
+}
+
+/**
+ * move_nodes - move nodes.
+ * @c: UBIFS file-system description object
+ * @sleb: describes nodes to move
+ *
+ * This function moves valid nodes from data LEB described by @sleb to the GC
+ * journal head. The obsolete nodes are dropped.
+ *
+ * When moving nodes we have to deal with classical bin-packing problem: the
+ * space in the current GC journal head LEB and in @c->gc_lnum are the "bins",
+ * where the nodes in the @sleb->nodes list are the elements which should be
+ * fit optimally to the bins. This function uses the "first fit decreasing"
+ * strategy, although it does not really sort the nodes but just split them on
+ * 3 classes - large, medium, and small, so they are roughly sorted.
+ *
+ * This function returns zero in case of success, %-EAGAIN if commit is
+ * required, and other negative error codes in case of other failures.
+ */
+static int move_nodes(struct ubifs_info *c, struct ubifs_scan_leb *sleb)
+{
+ struct ubifs_scan_node *snod, *tmp;
+ struct list_head large, medium, small;
+ struct ubifs_wbuf *wbuf = &c->jheads[GCHD].wbuf;
+ int avail, err, min = INT_MAX;
+
+ INIT_LIST_HEAD(&large);
+ INIT_LIST_HEAD(&medium);
+ INIT_LIST_HEAD(&small);
+
+ list_for_each_entry_safe(snod, tmp, &sleb->nodes, list) {
+ struct list_head *lst;
+
+ ubifs_assert(snod->type != UBIFS_IDX_NODE);
+ ubifs_assert(snod->type != UBIFS_REF_NODE);
+ ubifs_assert(snod->type != UBIFS_CS_NODE);
+
+ err = ubifs_tnc_has_node(c, &snod->key, 0, sleb->lnum,
+ snod->offs, 0);
+ if (err < 0)
+ goto out;
+
+ lst = &snod->list;
+ list_del(lst);
+ if (!err) {
+ /* The node is obsolete, remove it from the list */
+ kfree(snod);
+ continue;
+ }
+
+ /*
+ * Sort the list of nodes so that large nodes go first, and
+ * small nodes go last.
+ */
+ if (snod->len > MEDIUM_NODE_WM)
+ list_add(lst, &large);
+ else if (snod->len > SMALL_NODE_WM)
+ list_add(lst, &medium);
+ else
+ list_add(lst, &small);
+
+ /* And find the smallest node */
+ if (snod->len < min)
+ min = snod->len;
+ }
+
+ /*
+ * Join the tree lists so that we'd have one roughly sorted list
+ * ('large' will be the head of the joined list).
+ */
+ list_splice(&medium, large.prev);
+ list_splice(&small, large.prev);
+
+ if (wbuf->lnum == -1) {
+ /*
+ * The GC journal head is not set, because it is the first GC
+ * invocation since mount.
+ */
+ err = switch_gc_head(c);
+ if (err)
+ goto out;
+ }
+
+ /* Write nodes to their new location. Use the first-fit strategy */
+ while (1) {
+ avail = c->leb_size - wbuf->offs - wbuf->used;
+ list_for_each_entry_safe(snod, tmp, &large, list) {
+ int new_lnum, new_offs;
+
+ if (avail < min)
+ break;
+
+ if (snod->len > avail)
+ /* This node does not fit */
+ continue;
+
+ cond_resched();
+
+ new_lnum = wbuf->lnum;
+ new_offs = wbuf->offs + wbuf->used;
+ err = ubifs_wbuf_write_nolock(wbuf, snod->node,
+ snod->len);
+
+ err = ubifs_tnc_replace(c, &snod->key, sleb->lnum,
+ snod->offs, new_lnum, new_offs,
+ snod->len);
+ if (err)
+ goto out;
+
+ avail = c->leb_size - wbuf->offs - wbuf->used;
+ list_del(&snod->list);
+ kfree(snod);
+ }
+
+ if (list_empty(&large))
+ break;
+
+ /*
+ * Waste the rest of the space in the LEB and switch to the
+ * next LEB.
+ */
+ err = switch_gc_head(c);
+ if (err)
+ goto out;
+ }
+
+ return 0;
+
+out:
+ list_for_each_entry_safe(snod, tmp, &large, list) {
+ list_del(&snod->list);
+ kfree(snod);
+ }
+ return err;
+}
+
+/**
+ * gc_sync_wbufs - sync write-buffers for GC.
+ * @c: UBIFS file-system description object
+ *
+ * We must guarantee that obsoleting nodes are on flash. Unfortunately they may
+ * be in a write-buffer instead. That is, a node could be written to a
+ * write-buffer, obsoleting another node in a LEB that is GC'd. If that LEB is
+ * erased before the write-buffer is sync'd and then there is an unclean
+ * unmount, then an existing node is lost. To avoid this, we sync all
+ * write-buffers.
+ *
+ * This function returns %0 on success or a negative error code on failure.
+ */
+static int gc_sync_wbufs(struct ubifs_info *c)
+{
+ int err, i;
+
+ for (i = 0; i < c->jhead_cnt; i++) {
+ if (i == GCHD)
+ continue;
+ err = ubifs_wbuf_sync(&c->jheads[i].wbuf);
+ if (err)
+ return err;
+ }
+ return 0;
+}
+
+/**
+ * garbage_collect_leb - garbage-collect a logical eraseblock.
+ * @c: UBIFS file-system description object
+ * @lp: describes the LEB to garbage collect
+ *
+ * This function garbage-collects an LEB and returns one of the @LEB_FREED,
+ * @LEB_RETAINED, etc positive codes in case of success, %-EAGAIN if commit is
+ * required, and other negative error codes in case of failures.
+ */
+static int garbage_collect_leb(struct ubifs_info *c, struct ubifs_lprops *lp)
+{
+ struct ubifs_scan_leb *sleb;
+ struct ubifs_scan_node *snod;
+ struct ubifs_wbuf *wbuf = &c->jheads[GCHD].wbuf;
+ int err = 0, lnum = lp->lnum;
+
+ ubifs_assert(c->gc_lnum != -1 || wbuf->offs + wbuf->used == 0);
+ ubifs_assert(c->gc_lnum != lnum);
+ ubifs_assert(wbuf->lnum != lnum);
+
+ /*
+ * We scan the entire LEB even though we only really need to scan up to
+ * (c->leb_size - lp->free).
+ */
+ sleb = ubifs_scan(c, lnum, 0, c->sbuf);
+ if (IS_ERR(sleb))
+ return PTR_ERR(sleb);
+
+ ubifs_assert(!list_empty(&sleb->nodes));
+ snod = list_entry(sleb->nodes.next, struct ubifs_scan_node, list);
+
+ if (snod->type == UBIFS_IDX_NODE) {
+ struct ubifs_gced_idx_leb *idx_gc;
+
+ dbg_gc("indexing LEB %d (free %d, dirty %d)",
+ lnum, lp->free, lp->dirty);
+ list_for_each_entry(snod, &sleb->nodes, list) {
+ struct ubifs_idx_node *idx = snod->node;
+ int level = le16_to_cpu(idx->level);
+
+ ubifs_assert(snod->type == UBIFS_IDX_NODE);
+ key_read(c, ubifs_idx_key(c, idx), &snod->key);
+ err = ubifs_dirty_idx_node(c, &snod->key, level, lnum,
+ snod->offs);
+ if (err)
+ goto out;
+ }
+
+ idx_gc = kmalloc(sizeof(struct ubifs_gced_idx_leb), GFP_NOFS);
+ if (!idx_gc) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ idx_gc->lnum = lnum;
+ idx_gc->unmap = 0;
+ list_add(&idx_gc->list, &c->idx_gc);
+
+ /*
+ * Don't release the LEB until after the next commit, because
+ * it may contain date which is needed for recovery. So
+ * although we freed this LEB, it will become usable only after
+ * the commit.
+ */
+ err = ubifs_change_one_lp(c, lnum, c->leb_size, 0, 0,
+ LPROPS_INDEX, 1);
+ if (err)
+ goto out;
+ err = LEB_FREED_IDX;
+ } else {
+ dbg_gc("data LEB %d (free %d, dirty %d)",
+ lnum, lp->free, lp->dirty);
+
+ err = move_nodes(c, sleb);
+ if (err)
+ goto out;
+
+ err = gc_sync_wbufs(c);
+ if (err)
+ goto out;
+
+ err = ubifs_change_one_lp(c, lnum, c->leb_size, 0, 0, 0, 0);
+ if (err)
+ goto out;
+
+ if (c->gc_lnum == -1) {
+ c->gc_lnum = lnum;
+ err = LEB_RETAINED;
+ } else {
+ err = ubifs_wbuf_sync_nolock(wbuf);
+ if (err)
+ goto out;
+
+ err = ubifs_leb_unmap(c, lnum);
+ if (err)
+ goto out;
+
+ err = LEB_FREED;
+ }
+ }
+
+out:
+ ubifs_scan_destroy(sleb);
+ return err;
+}
+
+/**
+ * ubifs_garbage_collect - UBIFS garbage collector.
+ * @c: UBIFS file-system description object
+ * @anyway: do GC even if there are free LEBs
+ *
+ * This function does out-of-place garbage collection. The return codes are:
+ * o positive LEB number if the LEB has been freed and may be used;
+ * o %-EAGAIN if the caller has to run commit;
+ * o %-ENOSPC if GC failed to make any progress;
+ * o other negative error codes in case of other errors.
+ *
+ * Garbage collector writes data to the journal when GC'ing data LEBs, and just
+ * marking indexing nodes dirty when GC'ing indexing LEBs. Thus, at some point
+ * commit may be required. But commit cannot be run from inside GC, because the
+ * caller might be holding the commit lock, so %-EAGAIN is returned instead;
+ * And this error code means that the caller has to run commit, and re-run GC
+ * if there is still no free space.
+ *
+ * There are many reasons why this function may return %-EAGAIN:
+ * o the log is full and there is no space to write an LEB reference for
+ * @c->gc_lnum;
+ * o the journal is too large and exceeds size limitations;
+ * o GC moved indexing LEBs, but they can be used only after the commit;
+ * o the shrinker fails to find clean znodes to free and requests the commit;
+ * o etc.
+ *
+ * Note, if the file-system is close to be full, this function may return
+ * %-EAGAIN infinitely, so the caller has to limit amount of re-invocations of
+ * the function. E.g., this happens if the limits on the journal size are too
+ * tough and GC writes too much to the journal before an LEB is freed. This
+ * might also mean that the journal is too large, and the TNC becomes to big,
+ * so that the shrinker is constantly called, finds not clean znodes to free,
+ * and requests commit. Well, this may also happen if the journal is all right,
+ * but another kernel process consumes too much memory. Anyway, infinite
+ * %-EAGAIN may happen, but in some extreme/misconfiguration cases.
+ */
+int ubifs_garbage_collect(struct ubifs_info *c, int anyway)
+{
+ int i, err, ret, min_space = c->dead_wm;
+ struct ubifs_lprops lp;
+ struct ubifs_wbuf *wbuf = &c->jheads[GCHD].wbuf;
+
+ ubifs_assert_cmt_locked(c);
+
+ if (ubifs_gc_should_commit(c))
+ return -EAGAIN;
+
+ mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);
+ /* We expect the write-buffer to be empty on entry */
+ ubifs_assert(!wbuf->used);
+
+ for (i = 0; ; i++) {
+ int space_before = c->leb_size - wbuf->offs - wbuf->used;
+ int space_after;
+
+ cond_resched();
+
+ /* Give the commit an opportunity to run */
+ if (ubifs_gc_should_commit(c)) {
+ ret = -EAGAIN;
+ break;
+ }
+
+ if (i > SOFT_LEBS_LIMIT && !list_empty(&c->idx_gc)) {
+ /*
+ * We've done enough iterations. Indexing LEBs were
+ * moved and will be available after the commit.
+ */
+ dbg_gc("soft limit, some index LEBs GC'ed, -EAGAIN");
+ ubifs_commit_required(c);
+ ret = -EAGAIN;
+ break;
+ }
+
+ if (i > HARD_LEBS_LIMIT) {
+ /*
+ * We've moved too many LEBs and have not made
+ * progress, give up.
+ */
+ dbg_gc("hard limit, -ENOSPC");
+ ret = -ENOSPC;
+ break;
+ }
+
+ /*
+ * Empty and freeable LEBs can turn up while we waited for
+ * the wbuf lock, or while we have been running GC. In that
+ * case, we should just return one of those instead of
+ * continuing to GC dirty LEBs. Hence we request
+ * 'ubifs_find_dirty_leb()' to return an empty LEB if it can.
+ */
+ ret = ubifs_find_dirty_leb(c, &lp, min_space, !anyway);
+ if (ret) {
+ if (ret == -ENOSPC)
+ dbg_gc("no more dirty LEBs");
+ break;
+ }
+
+ dbg_gc("found LEB %d: free %d, dirty %d, sum %d "
+ "(min. space %d)", lp.lnum, lp.free, lp.dirty,
+ lp.free + lp.dirty, min_space);
+
+ if (lp.free + lp.dirty == c->leb_size) {
+ /* An empty LEB was returned */
+ dbg_gc("LEB %d is free, return it", lp.lnum);
+ /*
+ * ubifs_find_dirty_leb() doesn't return freeable index
+ * LEBs.
+ */
+ ubifs_assert(!(lp.flags & LPROPS_INDEX));
+ if (lp.free != c->leb_size) {
+ /*
+ * Write buffers must be sync'd before
+ * unmapping freeable LEBs, because one of them
+ * may contain data which obsoletes something
+ * in 'lp.pnum'.
+ */
+ ret = gc_sync_wbufs(c);
+ if (ret)
+ goto out;
+ ret = ubifs_change_one_lp(c, lp.lnum,
+ c->leb_size, 0, 0, 0,
+ 0);
+ if (ret)
+ goto out;
+ }
+ ret = ubifs_leb_unmap(c, lp.lnum);
+ if (ret)
+ goto out;
+ ret = lp.lnum;
+ break;
+ }
+
+ space_before = c->leb_size - wbuf->offs - wbuf->used;
+ if (wbuf->lnum == -1)
+ space_before = 0;
+
+ ret = garbage_collect_leb(c, &lp);
+ if (ret < 0) {
+ if (ret == -EAGAIN || ret == -ENOSPC) {
+ /*
+ * These codes are not errors, so we have to
+ * return the LEB to lprops. But if the
+ * 'ubifs_return_leb()' function fails, its
+ * failure code is propagated to the caller
+ * instead of the original '-EAGAIN' or
+ * '-ENOSPC'.
+ */
+ err = ubifs_return_leb(c, lp.lnum);
+ if (err)
+ ret = err;
+ break;
+ }
+ goto out;
+ }
+
+ if (ret == LEB_FREED) {
+ /* An LEB has been freed and is ready for use */
+ dbg_gc("LEB %d freed, return", lp.lnum);
+ ret = lp.lnum;
+ break;
+ }
+
+ if (ret == LEB_FREED_IDX) {
+ /*
+ * This was an indexing LEB and it cannot be
+ * immediately used. And instead of requesting the
+ * commit straight away, we try to garbage collect some
+ * more.
+ */
+ dbg_gc("indexing LEB %d freed, continue", lp.lnum);
+ continue;
+ }
+
+ ubifs_assert(ret == LEB_RETAINED);
+ space_after = c->leb_size - wbuf->offs - wbuf->used;
+ dbg_gc("LEB %d retained, freed %d bytes", lp.lnum,
+ space_after - space_before);
+
+ if (space_after > space_before) {
+ /* GC makes progress, keep working */
+ min_space >>= 1;
+ if (min_space < c->dead_wm)
+ min_space = c->dead_wm;
+ continue;
+ }
+
+ dbg_gc("did not make progress");
+
+ /*
+ * GC moved an LEB bud have not done any progress. This means
+ * that the previous GC head LEB contained too few free space
+ * and the LEB which was GC'ed contained only large nodes which
+ * did not fit that space.
+ *
+ * We can do 2 things:
+ * 1. pick another LEB in a hope it'll contain a small node
+ * which will fit the space we have at the end of current GC
+ * head LEB, but there is no guarantee, so we try this out
+ * unless we have already been working for too long;
+ * 2. request an LEB with more dirty space, which will force
+ * 'ubifs_find_dirty_leb()' to start scanning the lprops
+ * table, instead of just picking one from the heap
+ * (previously it already picked the dirtiest LEB).
+ */
+ if (i < SOFT_LEBS_LIMIT) {
+ dbg_gc("try again");
+ continue;
+ }
+
+ min_space <<= 1;
+ if (min_space > c->dark_wm)
+ min_space = c->dark_wm;
+ dbg_gc("set min. space to %d", min_space);
+ }
+
+ if (ret == -ENOSPC && !list_empty(&c->idx_gc)) {
+ dbg_gc("no space, some index LEBs GC'ed, -EAGAIN");
+ ubifs_commit_required(c);
+ ret = -EAGAIN;
+ }
+
+ err = ubifs_wbuf_sync_nolock(wbuf);
+ if (!err)
+ err = ubifs_leb_unmap(c, c->gc_lnum);
+ if (err)
+ ret = err;
+ mutex_unlock(&wbuf->io_mutex);
+ return ret;
+
+out:
+ ubifs_assert(ret < 0);
+ ubifs_assert(ret != -ENOSPC && ret != -EAGAIN);
+ ubifs_wbuf_sync_nolock(wbuf);
+ mutex_unlock(&wbuf->io_mutex);
+ ubifs_return_leb(c, lp.lnum);
+ return ret;
+}
+
+/**
+ * ubifs_gc_start_commit - garbage collection at start of commit.
+ * @c: UBIFS file-system description object
+ *
+ * If a LEB has only dirty and free space, then we may safely unmap it and make
+ * it free. Note, we cannot do this with indexing LEBs because dirty space may
+ * correspond index nodes that are required for recovery. In that case, the
+ * LEB cannot be unmapped until after the next commit.
+ *
+ * This function returns %0 upon success and a negative error code upon failure.
+ */
+int ubifs_gc_start_commit(struct ubifs_info *c)
+{
+ struct ubifs_gced_idx_leb *idx_gc;
+ const struct ubifs_lprops *lp;
+ int err = 0, flags;
+
+ ubifs_get_lprops(c);
+
+ /*
+ * Unmap (non-index) freeable LEBs. Note that recovery requires that all
+ * wbufs are sync'd before this, which is done in 'do_commit()'.
+ */
+ while (1) {
+ lp = ubifs_fast_find_freeable(c);
+ if (unlikely(IS_ERR(lp))) {
+ err = PTR_ERR(lp);
+ goto out;
+ }
+ if (!lp)
+ break;
+ ubifs_assert(!(lp->flags & LPROPS_TAKEN));
+ ubifs_assert(!(lp->flags & LPROPS_INDEX));
+ err = ubifs_leb_unmap(c, lp->lnum);
+ if (err)
+ goto out;
+ lp = ubifs_change_lp(c, lp, c->leb_size, 0, lp->flags, 0);
+ if (unlikely(IS_ERR(lp))) {
+ err = PTR_ERR(lp);
+ goto out;
+ }
+ ubifs_assert(!(lp->flags & LPROPS_TAKEN));
+ ubifs_assert(!(lp->flags & LPROPS_INDEX));
+ }
+
+ /* Mark GC'd index LEBs OK to unmap after this commit finishes */
+ list_for_each_entry(idx_gc, &c->idx_gc, list)
+ idx_gc->unmap = 1;
+
+ /* Record index freeable LEBs for unmapping after commit */
+ while (1) {
+ lp = ubifs_fast_find_frdi_idx(c);
+ if (unlikely(IS_ERR(lp))) {
+ err = PTR_ERR(lp);
+ goto out;
+ }
+ if (!lp)
+ break;
+ idx_gc = kmalloc(sizeof(struct ubifs_gced_idx_leb), GFP_NOFS);
+ if (!idx_gc) {
+ err = -ENOMEM;
+ goto out;
+ }
+ ubifs_assert(!(lp->flags & LPROPS_TAKEN));
+ ubifs_assert(lp->flags & LPROPS_INDEX);
+ /* Don't release the LEB until after the next commit */
+ flags = (lp->flags | LPROPS_TAKEN) ^ LPROPS_INDEX;
+ lp = ubifs_change_lp(c, lp, c->leb_size, 0, flags, 1);
+ if (unlikely(IS_ERR(lp))) {
+ err = PTR_ERR(lp);
+ kfree(idx_gc);
+ goto out;
+ }
+ ubifs_assert(lp->flags & LPROPS_TAKEN);
+ ubifs_assert(!(lp->flags & LPROPS_INDEX));
+ idx_gc->lnum = lp->lnum;
+ idx_gc->unmap = 1;
+ list_add(&idx_gc->list, &c->idx_gc);
+ }
+out:
+ ubifs_release_lprops(c);
+ return err;
+}
+
+/**
+ * ubifs_gc_end_commit - garbage collection at end of commit.
+ * @c: UBIFS file-system description object
+ *
+ * This function completes out-of-place garbage collection of index LEBs.
+ */
+int ubifs_gc_end_commit(struct ubifs_info *c)
+{
+ struct ubifs_gced_idx_leb *idx_gc, *tmp;
+ struct ubifs_wbuf *wbuf;
+ int err = 0;
+
+ wbuf = &c->jheads[GCHD].wbuf;
+ mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);
+ list_for_each_entry_safe(idx_gc, tmp, &c->idx_gc, list)
+ if (idx_gc->unmap) {
+ dbg_gc("LEB %d", idx_gc->lnum);
+ err = ubifs_leb_unmap(c, idx_gc->lnum);
+ if (err)
+ goto out;
+ err = ubifs_change_one_lp(c, idx_gc->lnum, -1, -1, 0,
+ LPROPS_TAKEN, -1);
+ if (err)
+ goto out;
+ list_del(&idx_gc->list);
+ kfree(idx_gc);
+ }
+out:
+ mutex_unlock(&wbuf->io_mutex);
+ return err;
+}
+
+/**
+ * ubifs_destroy_idx_gc - destroy idx_gc list.
+ * @c: UBIFS file-system description object
+ *
+ * This function destroys the idx_gc list. It is called when unmounting or
+ * remounting read-only so locks are not needed.
+ */
+void ubifs_destroy_idx_gc(struct ubifs_info *c)
+{
+ while (!list_empty(&c->idx_gc)) {
+ struct ubifs_gced_idx_leb *idx_gc;
+
+ idx_gc = list_entry(c->idx_gc.next, struct ubifs_gced_idx_leb,
+ list);
+ c->idx_gc_cnt -= 1;
+ list_del(&idx_gc->list);
+ kfree(idx_gc);
+ }
+
+}
+
+/**
+ * ubifs_get_idx_gc_leb - get a LEB from GC'd index LEB list.
+ * @c: UBIFS file-system description object
+ *
+ * Called during start commit so locks are not needed.
+ */
+int ubifs_get_idx_gc_leb(struct ubifs_info *c)
+{
+ struct ubifs_gced_idx_leb *idx_gc;
+ int lnum;
+
+ if (list_empty(&c->idx_gc))
+ return -ENOSPC;
+ idx_gc = list_entry(c->idx_gc.next, struct ubifs_gced_idx_leb, list);
+ lnum = idx_gc->lnum;
+ /* c->idx_gc_cnt is updated by the caller when lprops are updated */
+ list_del(&idx_gc->list);
+ kfree(idx_gc);
+ return lnum;
+}
--
1.5.4.1

2008-03-27 13:08:32

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 17/26] UBIFS: add LEB properties tree

This is the commit-related part of the lprops sub-system.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/lpt_commit.c | 1628 +++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 1628 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/lpt_commit.c b/fs/ubifs/lpt_commit.c
new file mode 100644
index 0000000..2aa9712
--- /dev/null
+++ b/fs/ubifs/lpt_commit.c
@@ -0,0 +1,1628 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements commit-related functionality of the LEB properties
+ * subsystem.
+ */
+
+#include <linux/crc16.h>
+#include "ubifs.h"
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_LPROPS
+static int dbg_check_ltab(struct ubifs_info *c);
+#else
+#define dbg_check_ltab(c) 0
+#endif
+
+/**
+ * first_dirty_cnode - find first dirty cnode.
+ * @c: UBIFS file-system description object
+ * @nnode: nnode at which to start
+ *
+ * This function returns the first dirty cnode or %NULL if there is not one.
+ */
+static struct ubifs_cnode *first_dirty_cnode(struct ubifs_nnode *nnode)
+{
+ ubifs_assert(nnode);
+ while (1) {
+ int i, cont = 0;
+
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ struct ubifs_cnode *cnode;
+
+ cnode = nnode->nbranch[i].cnode;
+ if (cnode &&
+ test_bit(DIRTY_CNODE, &cnode->flags)) {
+ if (cnode->level == 0)
+ return cnode;
+ nnode = (struct ubifs_nnode *)cnode;
+ cont = 1;
+ break;
+ }
+ }
+ if (!cont)
+ return (struct ubifs_cnode *)nnode;
+ }
+}
+
+/**
+ * next_dirty_cnode - find next dirty cnode.
+ * @cnode: cnode from which to begin searching
+ *
+ * This function returns the next dirty cnode or %NULL if there is not one.
+ */
+static struct ubifs_cnode *next_dirty_cnode(struct ubifs_cnode *cnode)
+{
+ struct ubifs_nnode *nnode;
+ int i;
+
+ ubifs_assert(cnode);
+ nnode = cnode->parent;
+ if (!nnode)
+ return NULL;
+ for (i = cnode->iip + 1; i < UBIFS_LPT_FANOUT; i++) {
+ cnode = nnode->nbranch[i].cnode;
+ if (cnode && test_bit(DIRTY_CNODE, &cnode->flags)) {
+ if (cnode->level == 0)
+ return cnode; /* cnode is a pnode */
+ /* cnode is a nnode */
+ return first_dirty_cnode((struct ubifs_nnode *)cnode);
+ }
+ }
+ return (struct ubifs_cnode *)nnode;
+}
+
+/**
+ * get_cnodes_to_commit - create list of dirty cnodes to commit.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns the number of cnodes to commit.
+ */
+static int get_cnodes_to_commit(struct ubifs_info *c)
+{
+ struct ubifs_cnode *cnode, *cnext;
+ int cnt = 0;
+
+ if (!c->nroot)
+ return 0;
+
+ if (!test_bit(DIRTY_CNODE, &c->nroot->flags))
+ return 0;
+
+ c->lpt_cnext = first_dirty_cnode(c->nroot);
+ cnode = c->lpt_cnext;
+ if (!cnode)
+ return 0;
+ cnt += 1;
+ while (1) {
+ ubifs_assert(!test_bit(COW_ZNODE, &cnode->flags));
+ set_bit(COW_ZNODE, &cnode->flags);
+ cnext = next_dirty_cnode(cnode);
+ if (!cnext) {
+ cnode->cnext = c->lpt_cnext;
+ break;
+ }
+ cnode->cnext = cnext;
+ cnode = cnext;
+ cnt += 1;
+ }
+ dbg_cmt("committing %d cnodes", cnt);
+ dbg_lp("committing %d cnodes", cnt);
+ ubifs_assert(cnt == c->dirty_nn_cnt + c->dirty_pn_cnt);
+ return cnt;
+}
+
+/**
+ * upd_ltab - update LPT LEB properties.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number
+ * @free: amount of free space
+ * @dirty: amount of dirty space to add
+ */
+static void upd_ltab(struct ubifs_info *c, int lnum, int free, int dirty)
+{
+ dbg_lp("LEB %d free %d dirty %d to %d +%d",
+ lnum, c->ltab[lnum - c->lpt_first].free,
+ c->ltab[lnum - c->lpt_first].dirty, free, dirty);
+ ubifs_assert(lnum >= c->lpt_first && lnum <= c->lpt_last);
+ c->ltab[lnum - c->lpt_first].free = free;
+ c->ltab[lnum - c->lpt_first].dirty += dirty;
+}
+
+/**
+ * alloc_lpt_leb - allocate an LPT LEB that is empty.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number is passed and returned here
+ *
+ * This function finds the next empty LEB in the ltab starting from @lnum. If a
+ * an empty LEB is found it is returned in @lnum and the function returns %0.
+ * Otherwise the function returns -ENOSPC. Note however, that LPT is designed
+ * never to run out of space.
+ */
+static int alloc_lpt_leb(struct ubifs_info *c, int *lnum)
+{
+ int i, n;
+
+ n = *lnum - c->lpt_first + 1;
+ for (i = n; i < c->lpt_lebs; i++) {
+ if (c->ltab[i].tgc || c->ltab[i].cmt)
+ continue;
+ if (c->ltab[i].free == c->leb_size) {
+ c->ltab[i].cmt = 1;
+ *lnum = i + c->lpt_first;
+ return 0;
+ }
+ }
+
+ for (i = 0; i < n; i++) {
+ if (c->ltab[i].tgc || c->ltab[i].cmt)
+ continue;
+ if (c->ltab[i].free == c->leb_size) {
+ c->ltab[i].cmt = 1;
+ *lnum = i + c->lpt_first;
+ return 0;
+ }
+ }
+ dbg_err("last LEB %d", *lnum);
+ dump_stack();
+ return -ENOSPC;
+}
+
+/**
+ * layout_cnodes - layout cnodes for commit.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int layout_cnodes(struct ubifs_info *c)
+{
+ int lnum, offs, len, alen, done_lsave, done_ltab, err;
+ struct ubifs_cnode *cnode;
+
+ cnode = c->lpt_cnext;
+ if (!cnode)
+ return 0;
+ lnum = c->nhead_lnum;
+ offs = c->nhead_offs;
+ /* Try to place lsave and ltab nicely */
+ done_lsave = !c->big_lpt;
+ done_ltab = 0;
+ if (!done_lsave && offs + c->lsave_sz <= c->leb_size) {
+ done_lsave = 1;
+ c->lsave_lnum = lnum;
+ c->lsave_offs = offs;
+ offs += c->lsave_sz;
+ }
+
+ if (offs + c->ltab_sz <= c->leb_size) {
+ done_ltab = 1;
+ c->ltab_lnum = lnum;
+ c->ltab_offs = offs;
+ offs += c->ltab_sz;
+ }
+
+ do {
+ if (cnode->level) {
+ len = c->nnode_sz;
+ c->dirty_nn_cnt -= 1;
+ } else {
+ len = c->pnode_sz;
+ c->dirty_pn_cnt -= 1;
+ }
+ while (offs + len > c->leb_size) {
+ alen = ALIGN(offs, c->min_io_size);
+ upd_ltab(c, lnum, c->leb_size - alen, alen - offs);
+ err = alloc_lpt_leb(c, &lnum);
+ if (err)
+ return err;
+ offs = 0;
+ ubifs_assert(lnum >= c->lpt_first &&
+ lnum <= c->lpt_last);
+ /* Try to place lsave and ltab nicely */
+ if (!done_lsave) {
+ done_lsave = 1;
+ c->lsave_lnum = lnum;
+ c->lsave_offs = offs;
+ offs += c->lsave_sz;
+ continue;
+ }
+ if (!done_ltab) {
+ done_ltab = 1;
+ c->ltab_lnum = lnum;
+ c->ltab_offs = offs;
+ offs += c->ltab_sz;
+ continue;
+ }
+ break;
+ }
+ if (cnode->parent) {
+ cnode->parent->nbranch[cnode->iip].lnum = lnum;
+ cnode->parent->nbranch[cnode->iip].offs = offs;
+ } else {
+ c->lpt_lnum = lnum;
+ c->lpt_offs = offs;
+ }
+ offs += len;
+ cnode = cnode->cnext;
+ } while (cnode && cnode != c->lpt_cnext);
+
+ /* Make sure to place LPT's save table */
+ if (!done_lsave) {
+ if (offs + c->lsave_sz > c->leb_size) {
+ alen = ALIGN(offs, c->min_io_size);
+ upd_ltab(c, lnum, c->leb_size - alen, alen - offs);
+ err = alloc_lpt_leb(c, &lnum);
+ if (err)
+ return err;
+ offs = 0;
+ ubifs_assert(lnum >= c->lpt_first &&
+ lnum <= c->lpt_last);
+ }
+ done_lsave = 1;
+ c->lsave_lnum = lnum;
+ c->lsave_offs = offs;
+ offs += c->lsave_sz;
+ }
+
+ /* Make sure to place LPT's own lprops table */
+ if (!done_ltab) {
+ if (offs + c->ltab_sz > c->leb_size) {
+ alen = ALIGN(offs, c->min_io_size);
+ upd_ltab(c, lnum, c->leb_size - alen, alen - offs);
+ err = alloc_lpt_leb(c, &lnum);
+ if (err)
+ return err;
+ offs = 0;
+ ubifs_assert(lnum >= c->lpt_first &&
+ lnum <= c->lpt_last);
+ }
+ done_ltab = 1;
+ c->ltab_lnum = lnum;
+ c->ltab_offs = offs;
+ offs += c->ltab_sz;
+ }
+
+ alen = ALIGN(offs, c->min_io_size);
+ upd_ltab(c, lnum, c->leb_size - alen, alen - offs);
+ return 0;
+}
+
+/**
+ * realloc_lpt_leb - allocate an LPT LEB that is empty.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number is passed and returned here
+ *
+ * This function duplicates exactly the results of the function alloc_lpt_leb.
+ * It is used during end commit to reallocate the same LEB numbers that were
+ * allocated by alloc_lpt_leb during start commit.
+ *
+ * This function finds the next LEB that was allocated by the alloc_lpt_leb
+ * function starting from @lnum. If a LEB is found it is returned in @lnum and
+ * the function returns %0. Otherwise the function returns -ENOSPC.
+ * Note however, that LPT is designed never to run out of space.
+ */
+static int realloc_lpt_leb(struct ubifs_info *c, int *lnum)
+{
+ int i, n;
+
+ n = *lnum - c->lpt_first + 1;
+ for (i = n; i < c->lpt_lebs; i++)
+ if (c->ltab[i].cmt) {
+ c->ltab[i].cmt = 0;
+ *lnum = i + c->lpt_first;
+ return 0;
+ }
+
+ for (i = 0; i < n; i++)
+ if (c->ltab[i].cmt) {
+ c->ltab[i].cmt = 0;
+ *lnum = i + c->lpt_first;
+ return 0;
+ }
+ dbg_err("last LEB %d", *lnum);
+ dump_stack();
+ return -ENOSPC;
+}
+
+/**
+ * write_cnodes - write cnodes for commit.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int write_cnodes(struct ubifs_info *c)
+{
+ int lnum, offs, len, from, err, wlen, alen, done_ltab, done_lsave;
+ struct ubifs_cnode *cnode;
+ void *buf = c->lpt_buf;
+
+ cnode = c->lpt_cnext;
+ if (!cnode)
+ return 0;
+ lnum = c->nhead_lnum;
+ offs = c->nhead_offs;
+ from = offs;
+ /* Ensure empty LEB is unmapped */
+ if (offs == 0) {
+ err = ubifs_leb_unmap(c, lnum);
+ if (err)
+ return err;
+ }
+ /* Try to place lsave and ltab nicely */
+ done_lsave = !c->big_lpt;
+ done_ltab = 0;
+ if (!done_lsave && offs + c->lsave_sz <= c->leb_size) {
+ done_lsave = 1;
+ ubifs_pack_lsave(c, buf + offs, c->lsave);
+ offs += c->lsave_sz;
+ }
+
+ if (offs + c->ltab_sz <= c->leb_size) {
+ done_ltab = 1;
+ ubifs_pack_ltab(c, buf + offs, c->ltab_cmt);
+ offs += c->ltab_sz;
+ }
+
+ /* Loop for each cnode */
+ do {
+ if (cnode->level)
+ len = c->nnode_sz;
+ else
+ len = c->pnode_sz;
+ while (offs + len > c->leb_size) {
+ wlen = offs - from;
+ if (wlen) {
+ alen = ALIGN(wlen, c->min_io_size);
+ memset(buf + offs, 0xff, alen - wlen);
+ err = ubifs_leb_write(c, lnum, buf + from, from,
+ alen, UBI_SHORTTERM);
+ if (err)
+ return err;
+ }
+ err = realloc_lpt_leb(c, &lnum);
+ if (err)
+ return err;
+ offs = 0;
+ from = 0;
+ ubifs_assert(lnum >= c->lpt_first &&
+ lnum <= c->lpt_last);
+ err = ubifs_leb_unmap(c, lnum);
+ if (err)
+ return err;
+ /* Try to place lsave and ltab nicely */
+ if (!done_lsave) {
+ done_lsave = 1;
+ ubifs_pack_lsave(c, buf + offs, c->lsave);
+ offs += c->lsave_sz;
+ continue;
+ }
+ if (!done_ltab) {
+ done_ltab = 1;
+ ubifs_pack_ltab(c, buf + offs, c->ltab_cmt);
+ offs += c->ltab_sz;
+ continue;
+ }
+ break;
+ }
+ if (cnode->level)
+ ubifs_pack_nnode(c, buf + offs,
+ (struct ubifs_nnode *)cnode);
+ else
+ ubifs_pack_pnode(c, buf + offs,
+ (struct ubifs_pnode *)cnode);
+ clear_bit(DIRTY_CNODE, &cnode->flags);
+ smp_mb__before_clear_bit();
+ clear_bit(COW_ZNODE, &cnode->flags);
+ smp_mb__after_clear_bit();
+ offs += len;
+ cnode = cnode->cnext;
+ } while (cnode && cnode != c->lpt_cnext);
+
+ /* Make sure to place LPT's save table */
+ if (!done_lsave) {
+ if (offs + c->lsave_sz > c->leb_size) {
+ wlen = offs - from;
+ alen = ALIGN(wlen, c->min_io_size);
+ memset(buf + offs, 0xff, alen - wlen);
+ err = ubifs_leb_write(c, lnum, buf + from, from, alen,
+ UBI_SHORTTERM);
+ if (err)
+ return err;
+ err = realloc_lpt_leb(c, &lnum);
+ if (err)
+ return err;
+ offs = 0;
+ ubifs_assert(lnum >= c->lpt_first &&
+ lnum <= c->lpt_last);
+ err = ubifs_leb_unmap(c, lnum);
+ if (err)
+ return err;
+ }
+ done_lsave = 1;
+ ubifs_pack_lsave(c, buf + offs, c->lsave);
+ offs += c->lsave_sz;
+ }
+
+ /* Make sure to place LPT's own lprops table */
+ if (!done_ltab) {
+ if (offs + c->ltab_sz > c->leb_size) {
+ wlen = offs - from;
+ alen = ALIGN(wlen, c->min_io_size);
+ memset(buf + offs, 0xff, alen - wlen);
+ err = ubifs_leb_write(c, lnum, buf + from, from, alen,
+ UBI_SHORTTERM);
+ if (err)
+ return err;
+ err = realloc_lpt_leb(c, &lnum);
+ if (err)
+ return err;
+ offs = 0;
+ ubifs_assert(lnum >= c->lpt_first &&
+ lnum <= c->lpt_last);
+ err = ubifs_leb_unmap(c, lnum);
+ if (err)
+ return err;
+ }
+ done_ltab = 1;
+ ubifs_pack_ltab(c, buf + offs, c->ltab_cmt);
+ offs += c->ltab_sz;
+ }
+
+ /* Write remaining data in buffer */
+ wlen = offs - from;
+ alen = ALIGN(wlen, c->min_io_size);
+ memset(buf + offs, 0xff, alen - wlen);
+ err = ubifs_leb_write(c, lnum, buf + from, from, alen, UBI_SHORTTERM);
+ if (err)
+ return err;
+ c->nhead_lnum = lnum;
+ c->nhead_offs = ALIGN(offs, c->min_io_size);
+
+ dbg_lp("LPT root is at %d:%d", c->lpt_lnum, c->lpt_offs);
+ dbg_lp("LPT head is at %d:%d", c->nhead_lnum, c->nhead_offs);
+ dbg_lp("LPT ltab is at %d:%d", c->ltab_lnum, c->ltab_offs);
+ if (c->big_lpt)
+ dbg_lp("LPT lsave is at %d:%d", c->lsave_lnum, c->lsave_offs);
+ return 0;
+}
+
+/**
+ * next_pnode - find next pnode.
+ * @c: UBIFS file-system description object
+ * @pnode: pnode
+ *
+ * This function returns the next pnode or %NULL if there are no more pnodes.
+ */
+static struct ubifs_pnode *next_pnode(struct ubifs_info *c,
+ struct ubifs_pnode *pnode)
+{
+ struct ubifs_nnode *nnode;
+ int iip;
+
+ /* Try to go right */
+ nnode = pnode->parent;
+ iip = pnode->iip + 1;
+ if (iip < UBIFS_LPT_FANOUT) {
+ /* We assume here that LEB zero is never an LPT LEB */
+ if (nnode->nbranch[iip].lnum)
+ return ubifs_get_pnode(c, nnode, iip);
+ else
+ return NULL;
+ }
+
+ /* Go up while can't go right */
+ do {
+ iip = nnode->iip + 1;
+ nnode = nnode->parent;
+ if (!nnode)
+ return NULL;
+ /* We assume here that LEB zero is never an LPT LEB */
+ } while (iip >= UBIFS_LPT_FANOUT || !nnode->nbranch[iip].lnum);
+
+ /* Go right */
+ nnode = ubifs_get_nnode(c, nnode, iip);
+ if (IS_ERR(nnode))
+ return (void *)nnode;
+
+ /* Go down to level 1 */
+ while (nnode->level > 1) {
+ nnode = ubifs_get_nnode(c, nnode, 0);
+ if (IS_ERR(nnode))
+ return (void *)nnode;
+ }
+
+ return ubifs_get_pnode(c, nnode, 0);
+}
+
+/**
+ * pnode_lookup - lookup a pnode in the LPT.
+ * @c: UBIFS file-system description object
+ * @i: pnode number (0 to main_lebs - 1)
+ *
+ * This function returns a pointer to the pnode on success or a negative
+ * error code on failure.
+ */
+static struct ubifs_pnode *pnode_lookup(struct ubifs_info *c, int i)
+{
+ int err, h, iip, shft;
+ struct ubifs_nnode *nnode;
+
+ if (!c->nroot) {
+ err = ubifs_read_nnode(c, NULL, 0);
+ if (err)
+ return ERR_PTR(err);
+ }
+ i <<= UBIFS_LPT_FANOUT_SHIFT;
+ nnode = c->nroot;
+ shft = c->lpt_hght * UBIFS_LPT_FANOUT_SHIFT;
+ for (h = 1; h < c->lpt_hght; h++) {
+ iip = ((i >> shft) & (UBIFS_LPT_FANOUT - 1));
+ shft -= UBIFS_LPT_FANOUT_SHIFT;
+ nnode = ubifs_get_nnode(c, nnode, iip);
+ if (IS_ERR(nnode))
+ return ERR_PTR(PTR_ERR(nnode));
+ }
+ iip = ((i >> shft) & (UBIFS_LPT_FANOUT - 1));
+ return ubifs_get_pnode(c, nnode, iip);
+}
+
+/**
+ * add_pnode_dirt - add dirty space to LPT LEB properties.
+ * @c: UBIFS file-system description object
+ * @pnode: pnode for which to add dirt
+ */
+static void add_pnode_dirt(struct ubifs_info *c, struct ubifs_pnode *pnode)
+{
+ ubifs_add_lpt_dirt(c, pnode->parent->nbranch[pnode->iip].lnum,
+ c->pnode_sz);
+}
+
+/**
+ * do_make_pnode_dirty - mark a pnode dirty.
+ * @c: UBIFS file-system description object
+ * @pnode: pnode to mark dirty
+ */
+static void do_make_pnode_dirty(struct ubifs_info *c, struct ubifs_pnode *pnode)
+{
+ /* Assumes cnext list is empty i.e. not called during commit */
+ if (!test_and_set_bit(DIRTY_CNODE, &pnode->flags)) {
+ struct ubifs_nnode *nnode;
+
+ c->dirty_pn_cnt += 1;
+ add_pnode_dirt(c, pnode);
+ /* Mark parent and ancestors dirty too */
+ nnode = pnode->parent;
+ while (nnode) {
+ if (!test_and_set_bit(DIRTY_CNODE, &nnode->flags)) {
+ c->dirty_nn_cnt += 1;
+ ubifs_add_nnode_dirt(c, nnode);
+ nnode = nnode->parent;
+ } else
+ break;
+ }
+ }
+}
+
+/**
+ * make_tree_dirty - mark the entire LEB properties tree dirty.
+ * @c: UBIFS file-system description object
+ *
+ * This function is used by the "small" LPT model to cause the entire LEB
+ * properties tree to be written. The "small" LPT model does not use LPT
+ * garbage collection because it is more efficient to write the entire tree
+ * (because it is small).
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int make_tree_dirty(struct ubifs_info *c)
+{
+ struct ubifs_pnode *pnode;
+
+ pnode = pnode_lookup(c, 0);
+ while (pnode) {
+ do_make_pnode_dirty(c, pnode);
+ pnode = next_pnode(c, pnode);
+ if (IS_ERR(pnode))
+ return PTR_ERR(pnode);
+ }
+ return 0;
+}
+
+/**
+ * need_write_all - determine if the LPT area is running out of free space.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns %1 if the LPT area is running out of free space and %0
+ * if it is not.
+ */
+static int need_write_all(struct ubifs_info *c)
+{
+ long long free = 0;
+ int i;
+
+ for (i = 0; i < c->lpt_lebs; i++) {
+ if (i + c->lpt_first == c->nhead_lnum)
+ free += c->leb_size - c->nhead_offs;
+ else if (c->ltab[i].free == c->leb_size)
+ free += c->leb_size;
+ else if (c->ltab[i].free + c->ltab[i].dirty == c->leb_size)
+ free += c->leb_size;
+ }
+ /* Less than twice the size left */
+ if (free <= c->lpt_sz * 2)
+ return 1;
+ return 0;
+}
+
+/**
+ * lpt_tgc_start - start trivial garbage collection of LPT LEBs.
+ * @c: UBIFS file-system description object
+ *
+ * LPT trivial garbage collection is where a LPT LEB contains only dirty and
+ * free space and so may be reused as soon as the next commit is completed.
+ * This function is called during start commit to mark LPT LEBs for trivial GC.
+ */
+static void lpt_tgc_start(struct ubifs_info *c)
+{
+ int i;
+
+ for (i = 0; i < c->lpt_lebs; i++) {
+ if (i + c->lpt_first == c->nhead_lnum)
+ continue;
+ if (c->ltab[i].dirty > 0 &&
+ c->ltab[i].free + c->ltab[i].dirty == c->leb_size) {
+ c->ltab[i].tgc = 1;
+ c->ltab[i].free = c->leb_size;
+ c->ltab[i].dirty = 0;
+ dbg_lp("LEB %d", i + c->lpt_first);
+ }
+ }
+}
+
+/**
+ * lpt_tgc_end - end trivial garbage collection of LPT LEBs.
+ * @c: UBIFS file-system description object
+ *
+ * LPT trivial garbage collection is where a LPT LEB contains only dirty and
+ * free space and so may be reused as soon as the next commit is completed.
+ * This function is called after the commit is completed (master node has been
+ * written) and unmaps LPT LEBs that were marked for trivial GC.
+ */
+static int lpt_tgc_end(struct ubifs_info *c)
+{
+ int i, err;
+
+ for (i = 0; i < c->lpt_lebs; i++)
+ if (c->ltab[i].tgc) {
+ err = ubifs_leb_unmap(c, i + c->lpt_first);
+ if (err)
+ return err;
+ c->ltab[i].tgc = 0;
+ dbg_lp("LEB %d", i + c->lpt_first);
+ }
+ return 0;
+}
+
+/**
+ * populate_lsave - fill the lsave array with important LEB numbers.
+ * @c: the UBIFS file-system description object
+ *
+ * This function is only called for the "big" model. It records a small number
+ * of LEB numbers of important LEBs. Important LEBs are ones that are (from
+ * most important to least important): empty, freeable, freeable index, dirty
+ * index, dirty or free. Upon mount, we read this list of LEB numbers and bring
+ * their pnodes into memory. That will stop us from having to scan the LPT
+ * straight away. For the "small" model we assume that scanning the LPT is no
+ * big deal.
+ */
+static void populate_lsave(struct ubifs_info *c)
+{
+ struct ubifs_lprops *lprops;
+ struct ubifs_lpt_heap *heap;
+ int i, cnt = 0;
+
+ ubifs_assert(c->big_lpt);
+ if (!(c->lpt_drty_flgs & LSAVE_DIRTY)) {
+ c->lpt_drty_flgs |= LSAVE_DIRTY;
+ ubifs_add_lpt_dirt(c, c->lsave_lnum, c->lsave_sz);
+ }
+ list_for_each_entry(lprops, &c->empty_list, list) {
+ c->lsave[cnt++] = lprops->lnum;
+ if (cnt >= c->lsave_cnt)
+ return;
+ }
+ list_for_each_entry(lprops, &c->freeable_list, list) {
+ c->lsave[cnt++] = lprops->lnum;
+ if (cnt >= c->lsave_cnt)
+ return;
+ }
+ list_for_each_entry(lprops, &c->frdi_idx_list, list) {
+ c->lsave[cnt++] = lprops->lnum;
+ if (cnt >= c->lsave_cnt)
+ return;
+ }
+ heap = &c->lpt_heap[LPROPS_DIRTY_IDX - 1];
+ for (i = 0; i < heap->cnt; i++) {
+ c->lsave[cnt++] = heap->arr[i]->lnum;
+ if (cnt >= c->lsave_cnt)
+ return;
+ }
+ heap = &c->lpt_heap[LPROPS_DIRTY - 1];
+ for (i = 0; i < heap->cnt; i++) {
+ c->lsave[cnt++] = heap->arr[i]->lnum;
+ if (cnt >= c->lsave_cnt)
+ return;
+ }
+ heap = &c->lpt_heap[LPROPS_FREE - 1];
+ for (i = 0; i < heap->cnt; i++) {
+ c->lsave[cnt++] = heap->arr[i]->lnum;
+ if (cnt >= c->lsave_cnt)
+ return;
+ }
+ /* Fill it up completely */
+ while (cnt < c->lsave_cnt)
+ c->lsave[cnt++] = c->main_first;
+}
+
+/**
+ * ubifs_lpt_start_commit - UBIFS commit starts.
+ * @c: the UBIFS file-system description object
+ *
+ * This function has to be called when UBIFS starts the commit operation.
+ * This function "freezes" all currently dirty LEB properties and does not
+ * change them anymore. Further changes are saved and tracked separately
+ * because they are not part of this commit. This function returns zero in case
+ * of success and a negative error code in case of failure.
+ */
+int ubifs_lpt_start_commit(struct ubifs_info *c)
+{
+ int err, cnt;
+
+ dbg_lp("");
+
+ mutex_lock(&c->lp_mutex);
+ err = dbg_check_ltab(c);
+ if (err)
+ goto out;
+
+ lpt_tgc_start(c);
+
+ if (!c->dirty_pn_cnt) {
+ dbg_cmt("no cnodes to commit");
+ err = 0;
+ goto out;
+ }
+
+ if (!c->big_lpt && need_write_all(c)) {
+ /* If needed, write everything */
+ err = make_tree_dirty(c);
+ if (err)
+ goto out;
+ lpt_tgc_start(c);
+ }
+
+ if (c->big_lpt)
+ populate_lsave(c);
+
+ cnt = get_cnodes_to_commit(c);
+ ubifs_assert(cnt != 0);
+
+ err = layout_cnodes(c);
+ if (err)
+ goto out;
+
+ /* Copy the LPT's own lprops for end commit to write */
+ memcpy(c->ltab_cmt, c->ltab,
+ sizeof(struct ubifs_lpt_lprops) * c->lpt_lebs);
+ c->lpt_drty_flgs &= ~(LTAB_DIRTY | LSAVE_DIRTY);
+
+out:
+ mutex_unlock(&c->lp_mutex);
+ return err;
+}
+
+/**
+ * free_obsolete_cnodes - free obsolete cnodes for commit end.
+ * @c: UBIFS file-system description object
+ */
+static void free_obsolete_cnodes(struct ubifs_info *c)
+{
+ struct ubifs_cnode *cnode, *cnext;
+
+ cnext = c->lpt_cnext;
+ if (!cnext)
+ return;
+ do {
+ cnode = cnext;
+ cnext = cnode->cnext;
+ if (test_bit(OBSOLETE_CNODE, &cnode->flags))
+ kfree(cnode);
+ else
+ cnode->cnext = NULL;
+ } while (cnext != c->lpt_cnext);
+ c->lpt_cnext = NULL;
+}
+
+/**
+ * ubifs_lpt_end_commit - finish the commit operation.
+ * @c: the UBIFS file-system description object
+ *
+ * This function has to be called when the commit operation finishes. It
+ * flushes the changes which were "frozen" by 'ubifs_lprops_start_commit()' to
+ * the media. Returns zero in case of success and a negative error code in case
+ * of failure.
+ */
+int ubifs_lpt_end_commit(struct ubifs_info *c)
+{
+ int err;
+
+ dbg_lp("");
+
+ if (!c->lpt_cnext)
+ return 0;
+
+ err = write_cnodes(c);
+ if (err)
+ return err;
+
+ mutex_lock(&c->lp_mutex);
+ free_obsolete_cnodes(c);
+ mutex_unlock(&c->lp_mutex);
+
+ return 0;
+}
+
+/**
+ * nnode_lookup - lookup a nnode in the LPT.
+ * @c: UBIFS file-system description object
+ * @i: nnode number
+ *
+ * This function returns a pointer to the nnode on success or a negative
+ * error code on failure.
+ */
+static struct ubifs_nnode *nnode_lookup(struct ubifs_info *c, int i)
+{
+ int err, iip;
+ struct ubifs_nnode *nnode;
+
+ if (!c->nroot) {
+ err = ubifs_read_nnode(c, NULL, 0);
+ if (err)
+ return ERR_PTR(err);
+ }
+ nnode = c->nroot;
+ while (1) {
+ iip = i & (UBIFS_LPT_FANOUT - 1);
+ i >>= UBIFS_LPT_FANOUT_SHIFT;
+ if (!i)
+ break;
+ nnode = ubifs_get_nnode(c, nnode, iip);
+ if (IS_ERR(nnode))
+ return nnode;
+ }
+ return nnode;
+}
+
+/**
+ * make_nnode_dirty - find a nnode and, if found, make it dirty.
+ * @c: UBIFS file-system description object
+ * @node_num: nnode number of nnode to make dirty
+ * @lnum: LEB number where nnode was written
+ * @offs: offset where nnode was written
+ *
+ * This function is used by LPT garbage collection. LPT garbage collection is
+ * used only for the "big" LPT model (c->big_lpt == 1). Garbage collection
+ * simply involves marking all the nodes in the LEB being garbage-collected as
+ * dirty. The dirty nodes are written next commit, after which the LEB is free
+ * to be reused.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int make_nnode_dirty(struct ubifs_info *c, int node_num, int lnum,
+ int offs)
+{
+ struct ubifs_nnode *nnode;
+
+ nnode = nnode_lookup(c, node_num);
+ if (IS_ERR(nnode))
+ return PTR_ERR(nnode);
+ if (nnode->parent) {
+ struct ubifs_nbranch *branch;
+
+ branch = &nnode->parent->nbranch[nnode->iip];
+ if (branch->lnum != lnum || branch->offs != offs)
+ return 0; /* nnode is obsolete */
+ } else if (c->lpt_lnum != lnum || c->lpt_offs != offs)
+ return 0; /* nnode is obsolete */
+ /* Assumes cnext list is empty i.e. not called during commit */
+ if (!test_and_set_bit(DIRTY_CNODE, &nnode->flags)) {
+ c->dirty_nn_cnt += 1;
+ ubifs_add_nnode_dirt(c, nnode);
+ /* Mark parent and ancestors dirty too */
+ nnode = nnode->parent;
+ while (nnode) {
+ if (!test_and_set_bit(DIRTY_CNODE, &nnode->flags)) {
+ c->dirty_nn_cnt += 1;
+ ubifs_add_nnode_dirt(c, nnode);
+ nnode = nnode->parent;
+ } else
+ break;
+ }
+ }
+ return 0;
+}
+
+/**
+ * make_pnode_dirty - find a pnode and, if found, make it dirty.
+ * @c: UBIFS file-system description object
+ * @node_num: pnode number of pnode to make dirty
+ * @lnum: LEB number where pnode was written
+ * @offs: offset where pnode was written
+ *
+ * This function is used by LPT garbage collection. LPT garbage collection is
+ * used only for the "big" LPT model (c->big_lpt == 1). Garbage collection
+ * simply involves marking all the nodes in the LEB being garbage-collected as
+ * dirty. The dirty nodes are written next commit, after which the LEB is free
+ * to be reused.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int make_pnode_dirty(struct ubifs_info *c, int node_num, int lnum,
+ int offs)
+{
+ struct ubifs_pnode *pnode;
+ struct ubifs_nbranch *branch;
+
+ pnode = pnode_lookup(c, node_num);
+ if (IS_ERR(pnode))
+ return PTR_ERR(pnode);
+ branch = &pnode->parent->nbranch[pnode->iip];
+ if (branch->lnum != lnum || branch->offs != offs)
+ return 0;
+ do_make_pnode_dirty(c, pnode);
+ return 0;
+}
+
+/**
+ * make_ltab_dirty - make ltab node dirty.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number where ltab was written
+ * @offs: offset where ltab was written
+ *
+ * This function is used by LPT garbage collection. LPT garbage collection is
+ * used only for the "big" LPT model (c->big_lpt == 1). Garbage collection
+ * simply involves marking all the nodes in the LEB being garbage-collected as
+ * dirty. The dirty nodes are written next commit, after which the LEB is free
+ * to be reused.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int make_ltab_dirty(struct ubifs_info *c, int lnum, int offs)
+{
+ if (lnum != c->ltab_lnum || offs != c->ltab_offs)
+ return 0; /* This ltab node is obsolete */
+ if (!(c->lpt_drty_flgs & LTAB_DIRTY)) {
+ c->lpt_drty_flgs |= LTAB_DIRTY;
+ ubifs_add_lpt_dirt(c, c->ltab_lnum, c->ltab_sz);
+ }
+ return 0;
+}
+
+/**
+ * make_lsave_dirty - make lsave node dirty.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number where lsave was written
+ * @offs: offset where lsave was written
+ *
+ * This function is used by LPT garbage collection. LPT garbage collection is
+ * used only for the "big" LPT model (c->big_lpt == 1). Garbage collection
+ * simply involves marking all the nodes in the LEB being garbage-collected as
+ * dirty. The dirty nodes are written next commit, after which the LEB is free
+ * to be reused.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int make_lsave_dirty(struct ubifs_info *c, int lnum, int offs)
+{
+ if (lnum != c->lsave_lnum || offs != c->lsave_offs)
+ return 0; /* This lsave node is obsolete */
+ if (!(c->lpt_drty_flgs & LSAVE_DIRTY)) {
+ c->lpt_drty_flgs |= LSAVE_DIRTY;
+ ubifs_add_lpt_dirt(c, c->lsave_lnum, c->lsave_sz);
+ }
+ return 0;
+}
+
+/**
+ * make_node_dirty - make node dirty.
+ * @c: UBIFS file-system description object
+ * @node_type: LPT node type
+ * @node_num: node number
+ * @lnum: LEB number where node was written
+ * @offs: offset where node was written
+ *
+ * This function is used by LPT garbage collection. LPT garbage collection is
+ * used only for the "big" LPT model (c->big_lpt == 1). Garbage collection
+ * simply involves marking all the nodes in the LEB being garbage-collected as
+ * dirty. The dirty nodes are written next commit, after which the LEB is free
+ * to be reused.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int make_node_dirty(struct ubifs_info *c, int node_type, int node_num,
+ int lnum, int offs)
+{
+ switch (node_type) {
+ case UBIFS_LPT_NNODE:
+ return make_nnode_dirty(c, node_num, lnum, offs);
+ case UBIFS_LPT_PNODE:
+ return make_pnode_dirty(c, node_num, lnum, offs);
+ case UBIFS_LPT_LTAB:
+ return make_ltab_dirty(c, lnum, offs);
+ case UBIFS_LPT_LSAVE:
+ return make_lsave_dirty(c, lnum, offs);
+ }
+ return -EINVAL;
+}
+
+/**
+ * get_lpt_node_len - return the length of a node based on its type.
+ * @c: UBIFS file-system description object
+ * @node_type: LPT node type
+ */
+static int get_lpt_node_len(struct ubifs_info *c, int node_type)
+{
+ switch (node_type) {
+ case UBIFS_LPT_NNODE:
+ return c->nnode_sz;
+ case UBIFS_LPT_PNODE:
+ return c->pnode_sz;
+ case UBIFS_LPT_LTAB:
+ return c->ltab_sz;
+ case UBIFS_LPT_LSAVE:
+ return c->lsave_sz;
+ }
+ return 0;
+}
+
+/**
+ * get_pad_len - return the length of padding in a buffer.
+ * @c: UBIFS file-system description object
+ * @buf: buffer
+ * @len: length of buffer
+ */
+static int get_pad_len(struct ubifs_info *c, uint8_t *buf, int len)
+{
+ int offs, pad_len;
+
+ if (c->min_io_size == 1)
+ return 0;
+ offs = c->leb_size - len;
+ pad_len = ALIGN(offs, c->min_io_size) - offs;
+ return pad_len;
+}
+
+/**
+ * get_lpt_node_type - return type (and node number) of a node in a buffer.
+ * @c: UBIFS file-system description object
+ * @buf: buffer
+ * @node_num: node number is returned here
+ */
+static int get_lpt_node_type(struct ubifs_info *c, uint8_t *buf, int *node_num)
+{
+ uint8_t *addr = buf + UBIFS_LPT_CRC_BYTES;
+ int pos = 0, node_type;
+
+ node_type = ubifs_unpack_bits(&addr, &pos, UBIFS_LPT_TYPE_BITS);
+ *node_num = ubifs_unpack_bits(&addr, &pos, c->pcnt_bits);
+ return node_type;
+}
+
+/**
+ * is_a_node - determine if a buffer contains a node.
+ * @c: UBIFS file-system description object
+ * @buf: buffer
+ * @len: length of buffer
+ *
+ * This function returns %1 if the buffer contains a node or %0 if it does not.
+ */
+static int is_a_node(struct ubifs_info *c, uint8_t *buf, int len)
+{
+ uint8_t *addr = buf + UBIFS_LPT_CRC_BYTES;
+ int pos = 0, node_type, node_len;
+ uint16_t crc, calc_crc;
+
+ node_type = ubifs_unpack_bits(&addr, &pos, UBIFS_LPT_TYPE_BITS);
+ if (node_type == UBIFS_LPT_NOT_A_NODE)
+ return 0;
+ node_len = get_lpt_node_len(c, node_type);
+ if (!node_len || node_len > len)
+ return 0;
+ pos = 0;
+ addr = buf;
+ crc = ubifs_unpack_bits(&addr, &pos, UBIFS_LPT_CRC_BITS);
+ calc_crc = crc16(-1, buf + UBIFS_LPT_CRC_BYTES,
+ node_len - UBIFS_LPT_CRC_BYTES);
+ if (crc != calc_crc)
+ return 0;
+ return 1;
+}
+
+
+/**
+ * lpt_gc_lnum - garbage collect a LPT LEB.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number to garbage collect
+ *
+ * LPT garbage collection is used only for the "big" LPT model
+ * (c->big_lpt == 1). Garbage collection simply involves marking all the nodes
+ * in the LEB being garbage-collected as dirty. The dirty nodes are written
+ * next commit, after which the LEB is free to be reused.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int lpt_gc_lnum(struct ubifs_info *c, int lnum)
+{
+ int err, len = c->leb_size, node_type, node_num, node_len, offs;
+ void *buf = c->lpt_buf;
+
+ dbg_lp("LEB %d", lnum);
+ err = ubi_read(c->ubi, lnum, buf, 0, c->leb_size);
+ if (err) {
+ ubifs_err("cannot read LEB %d, error %d", lnum, err);
+ return err;
+ }
+ while (1) {
+ if (!is_a_node(c, buf, len)) {
+ int pad_len;
+
+ pad_len = get_pad_len(c, buf, len);
+ if (pad_len) {
+ buf += pad_len;
+ len -= pad_len;
+ continue;
+ }
+ return 0;
+ }
+ node_type = get_lpt_node_type(c, buf, &node_num);
+ node_len = get_lpt_node_len(c, node_type);
+ offs = c->leb_size - len;
+ ubifs_assert(node_len != 0);
+ mutex_lock(&c->lp_mutex);
+ err = make_node_dirty(c, node_type, node_num, lnum, offs);
+ mutex_unlock(&c->lp_mutex);
+ if (err)
+ return err;
+ buf += node_len;
+ len -= node_len;
+ }
+ return 0;
+}
+
+/**
+ * lpt_gc - LPT garbage collection.
+ * @c: UBIFS file-system description object
+ *
+ * Select a LPT LEB for LPT garbage collection and call 'lpt_gc_lnum()'.
+ * Returns %0 on success and a negative error code on failure.
+ */
+static int lpt_gc(struct ubifs_info *c)
+{
+ int i, lnum = -1, dirty = 0;
+
+ mutex_lock(&c->lp_mutex);
+ for (i = 0; i < c->lpt_lebs; i++) {
+ ubifs_assert(!c->ltab[i].tgc);
+ if (i + c->lpt_first == c->nhead_lnum ||
+ c->ltab[i].free + c->ltab[i].dirty == c->leb_size)
+ continue;
+ if (c->ltab[i].dirty > dirty) {
+ dirty = c->ltab[i].dirty;
+ lnum = i + c->lpt_first;
+ }
+ }
+ mutex_unlock(&c->lp_mutex);
+ if (lnum == -1)
+ return -ENOSPC;
+ return lpt_gc_lnum(c, lnum);
+}
+
+/**
+ * ubifs_lpt_post_commit - post commit LPT trivial GC and LPT GC.
+ * @c: UBIFS file-system description object
+ *
+ * LPT trivial GC is completed after a commit. Also LPT GC is done after a
+ * commit for the "big" LPT model.
+ */
+int ubifs_lpt_post_commit(struct ubifs_info *c)
+{
+ int err;
+
+ mutex_lock(&c->lp_mutex);
+ err = lpt_tgc_end(c);
+ if (err)
+ goto out;
+ if (c->big_lpt)
+ while (need_write_all(c)) {
+ mutex_unlock(&c->lp_mutex);
+ err = lpt_gc(c);
+ if (err)
+ return err;
+ mutex_lock(&c->lp_mutex);
+ }
+out:
+ mutex_unlock(&c->lp_mutex);
+ return err;
+}
+
+/**
+ * first_nnode - find the first nnode in memory.
+ * @c: UBIFS file-system description object
+ * @hght: height of tree where nnode found is returned here
+ *
+ * This function returns a pointer to the nnode found or %NULL if no nnode is
+ * found. This function is a helper to 'ubifs_lpt_free()'.
+ */
+static struct ubifs_nnode *first_nnode(struct ubifs_info *c, int *hght)
+{
+ struct ubifs_nnode *nnode;
+ int h, i, found;
+
+ nnode = c->nroot;
+ *hght = 0;
+ if (!nnode)
+ return NULL;
+ for (h = 1; h < c->lpt_hght; h++) {
+ found = 0;
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ if (nnode->nbranch[i].nnode) {
+ found = 1;
+ nnode = nnode->nbranch[i].nnode;
+ *hght = h;
+ break;
+ }
+ }
+ if (!found)
+ break;
+ }
+ return nnode;
+}
+
+/**
+ * next_nnode - find the next nnode in memory.
+ * @c: UBIFS file-system description object
+ * @nnode: nnode from which to start.
+ * @hght: height of tree where nnode is, is passed and returned here
+ *
+ * This function returns a pointer to the nnode found or %NULL if no nnode is
+ * found. This function is a helper to 'ubifs_lpt_free()'.
+ */
+static struct ubifs_nnode *next_nnode(struct ubifs_info *c,
+ struct ubifs_nnode *nnode, int *hght)
+{
+ struct ubifs_nnode *parent;
+ int iip, h, i, found;
+
+ parent = nnode->parent;
+ if (!parent)
+ return NULL;
+ if (nnode->iip == UBIFS_LPT_FANOUT - 1) {
+ *hght -= 1;
+ return parent;
+ }
+ for (iip = nnode->iip + 1; iip < UBIFS_LPT_FANOUT; iip++) {
+ nnode = parent->nbranch[iip].nnode;
+ if (nnode)
+ break;
+ }
+ if (!nnode) {
+ *hght -= 1;
+ return parent;
+ }
+ for (h = *hght + 1; h < c->lpt_hght; h++) {
+ found = 0;
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ if (nnode->nbranch[i].nnode) {
+ found = 1;
+ nnode = nnode->nbranch[i].nnode;
+ *hght = h;
+ break;
+ }
+ }
+ if (!found)
+ break;
+ }
+ return nnode;
+}
+
+/**
+ * ubifs_lpt_free - free resources owned by the LPT.
+ * @c: UBIFS file-system description object
+ * @wr_only: free only resources used for writing
+ */
+void ubifs_lpt_free(struct ubifs_info *c, int wr_only)
+{
+ struct ubifs_nnode *nnode;
+ int i, hght;
+
+ /* Free write-only things first */
+
+ free_obsolete_cnodes(c); /* Leftover from a failed commit */
+
+ vfree(c->ltab_cmt);
+ c->ltab_cmt = NULL;
+ vfree(c->lpt_buf);
+ c->lpt_buf = NULL;
+ kfree(c->lsave);
+ c->lsave = NULL;
+
+ if (wr_only)
+ return;
+
+ /* Now free the rest */
+
+ nnode = first_nnode(c, &hght);
+ while (nnode) {
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++)
+ kfree(nnode->nbranch[i].nnode);
+ nnode = next_nnode(c, nnode, &hght);
+ }
+ for (i = 0; i < LPROPS_HEAP_CNT; i++)
+ kfree(c->lpt_heap[i].arr);
+ kfree(c->dirty_idx.arr);
+ kfree(c->nroot);
+ vfree(c->ltab);
+ kfree(c->lpt_nod_buf);
+}
+
+#if defined(CONFIG_UBIFS_FS_DEBUG_CHK_LPROPS)
+
+/**
+ * dbg_is_all_ff - determine if a buffer contains only 0xff bytes.
+ * @buf: buffer
+ * @len: buffer length
+ */
+static int dbg_is_all_ff(uint8_t *buf, int len)
+{
+ int i;
+
+ for (i = 0; i < len; i++)
+ if (buf[i] != 0xff)
+ return 0;
+ return 1;
+}
+
+/**
+ * dbg_is_nnode_dirty - determine if a nnode is dirty.
+ * @c: the UBIFS file-system description object
+ * @lnum: LEB number where nnode was written
+ * @offs: offset where nnode was written
+ */
+static int dbg_is_nnode_dirty(struct ubifs_info *c, int lnum, int offs)
+{
+ struct ubifs_nnode *nnode;
+ int hght;
+
+ /* Entire tree is in memory so first_nnode / next_nnode are ok */
+ nnode = first_nnode(c, &hght);
+ for (; nnode; nnode = next_nnode(c, nnode, &hght)) {
+ struct ubifs_nbranch *branch;
+
+ cond_resched();
+ if (nnode->parent) {
+ branch = &nnode->parent->nbranch[nnode->iip];
+ if (branch->lnum != lnum || branch->offs != offs)
+ continue;
+ if (test_bit(DIRTY_CNODE, &nnode->flags))
+ return 1;
+ return 0;
+ } else {
+ if (c->lpt_lnum != lnum || c->lpt_offs != offs)
+ continue;
+ if (test_bit(DIRTY_CNODE, &nnode->flags))
+ return 1;
+ return 0;
+ }
+ }
+ return 1;
+}
+
+/**
+ * dbg_is_pnode_dirty - determine if a pnode is dirty.
+ * @c: the UBIFS file-system description object
+ * @lnum: LEB number where pnode was written
+ * @offs: offset where pnode was written
+ */
+static int dbg_is_pnode_dirty(struct ubifs_info *c, int lnum, int offs)
+{
+ int i, cnt;
+
+ cnt = DIV_ROUND_UP(c->main_lebs, UBIFS_LPT_FANOUT);
+ for (i = 0; i < cnt; i++) {
+ struct ubifs_pnode *pnode;
+ struct ubifs_nbranch *branch;
+
+ cond_resched();
+ pnode = pnode_lookup(c, i);
+ if (IS_ERR(pnode))
+ return PTR_ERR(pnode);
+ branch = &pnode->parent->nbranch[pnode->iip];
+ if (branch->lnum != lnum || branch->offs != offs)
+ continue;
+ if (test_bit(DIRTY_CNODE, &pnode->flags))
+ return 1;
+ return 0;
+ }
+ return 1;
+}
+
+/**
+ * dbg_is_ltab_dirty - determine if a ltab node is dirty.
+ * @c: the UBIFS file-system description object
+ * @lnum: LEB number where ltab node was written
+ * @offs: offset where ltab node was written
+ */
+static int dbg_is_ltab_dirty(struct ubifs_info *c, int lnum, int offs)
+{
+ if (lnum != c->ltab_lnum || offs != c->ltab_offs)
+ return 1;
+ return (c->lpt_drty_flgs & LTAB_DIRTY) != 0;
+}
+
+/**
+ * dbg_is_lsave_dirty - determine if a lsave node is dirty.
+ * @c: the UBIFS file-system description object
+ * @lnum: LEB number where lsave node was written
+ * @offs: offset where lsave node was written
+ */
+static int dbg_is_lsave_dirty(struct ubifs_info *c, int lnum, int offs)
+{
+ if (lnum != c->lsave_lnum || offs != c->lsave_offs)
+ return 1;
+ return (c->lpt_drty_flgs & LSAVE_DIRTY) != 0;
+}
+
+/**
+ * dbg_is_node_dirty - determine if a node is dirty.
+ * @c: the UBIFS file-system description object
+ * @node_type: node type
+ * @lnum: LEB number where node was written
+ * @offs: offset where node was written
+ */
+static int dbg_is_node_dirty(struct ubifs_info *c, int node_type, int lnum,
+ int offs)
+{
+ switch (node_type) {
+ case UBIFS_LPT_NNODE:
+ return dbg_is_nnode_dirty(c, lnum, offs);
+ case UBIFS_LPT_PNODE:
+ return dbg_is_pnode_dirty(c, lnum, offs);
+ case UBIFS_LPT_LTAB:
+ return dbg_is_ltab_dirty(c, lnum, offs);
+ case UBIFS_LPT_LSAVE:
+ return dbg_is_lsave_dirty(c, lnum, offs);
+ }
+ return 1;
+}
+
+/**
+ * dbg_check_ltab_lnum - check the ltab for a LPT LEB number.
+ * @c: the UBIFS file-system description object
+ * @lnum: LEB number where node was written
+ * @offs: offset where node was written
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int dbg_check_ltab_lnum(struct ubifs_info *c, int lnum)
+{
+ int err, len = c->leb_size, dirty = 0, node_type, node_num, node_len;
+ int ret;
+ void *buf = c->dbg_buf;
+
+ dbg_lp("LEB %d", lnum);
+ err = ubi_read(c->ubi, lnum, buf, 0, c->leb_size);
+ if (err) {
+ dbg_msg("ubi_read failed, LEB %d, error %d", lnum, err);
+ return err;
+ }
+ while (1) {
+ if (!is_a_node(c, buf, len)) {
+ int i, pad_len;
+
+ pad_len = get_pad_len(c, buf, len);
+ if (pad_len) {
+ buf += pad_len;
+ len -= pad_len;
+ dirty += pad_len;
+ continue;
+ }
+ if (!dbg_is_all_ff(buf, len)) {
+ dbg_msg("invalid empty space in LEB %d at %d",
+ lnum, c->leb_size - len);
+ err = -EINVAL;
+ }
+ i = lnum - c->lpt_first;
+ if (len != c->ltab[i].free) {
+ dbg_msg("invalid free space in LEB %d "
+ "(free %d, expected %d)",
+ lnum, len, c->ltab[i].free);
+ err = -EINVAL;
+ }
+ if (dirty != c->ltab[i].dirty) {
+ dbg_msg("invalid dirty space in LEB %d "
+ "(dirty %d, expected %d)",
+ lnum, dirty, c->ltab[i].dirty);
+ err = -EINVAL;
+ }
+ return err;
+ }
+ node_type = get_lpt_node_type(c, buf, &node_num);
+ node_len = get_lpt_node_len(c, node_type);
+ ret = dbg_is_node_dirty(c, node_type, lnum, c->leb_size - len);
+ if (ret == 1)
+ dirty += node_len;
+ buf += node_len;
+ len -= node_len;
+ }
+}
+
+/**
+ * dbg_check_ltab - check the free and dirty space in the ltab.
+ * @c: the UBIFS file-system description object
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int dbg_check_ltab(struct ubifs_info *c)
+{
+ int lnum, err, i, cnt;
+
+ /* Bring the entire tree into memory */
+ cnt = DIV_ROUND_UP(c->main_lebs, UBIFS_LPT_FANOUT);
+ for (i = 0; i < cnt; i++) {
+ struct ubifs_pnode *pnode;
+
+ pnode = pnode_lookup(c, i);
+ if (IS_ERR(pnode))
+ return PTR_ERR(pnode);
+ cond_resched();
+ }
+
+ /* Check nodes */
+ err = dbg_check_lpt_nodes(c, (struct ubifs_cnode *)c->nroot, 0, 0);
+ if (err)
+ return err;
+
+ /* Check each LEB */
+ for (lnum = c->lpt_first; lnum <= c->lpt_last; lnum++) {
+ err = dbg_check_ltab_lnum(c, lnum);
+ if (err) {
+ dbg_err("failed at LEB %d", lnum);
+ return err;
+ }
+ }
+
+ dbg_lp("succeeded");
+ return 0;
+}
+
+#endif /* CONFIG_UBIFS_FS_DEBUG_CHK_LPROPS */
--
1.5.4.1

2008-03-27 13:08:48

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 21/26] UBIFS: add budgeting

Because of compression and space wastage (due to paddings) it is not
always possible to know whether the cached data fits the flash space
or not. Sometimes this problem is called "ENOSPC" problem. UBIFS
implements the budgeting sub-system to solve the issue. All the FS
operations have to acquire the budget. The budgeting subsystem does
pessimistic space calculations (e.g., assumes the data is not
compressible) and forces write-back or garbage-collection if needed.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/budget.c | 822 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 822 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/budget.c b/fs/ubifs/budget.c
new file mode 100644
index 0000000..e975796
--- /dev/null
+++ b/fs/ubifs/budget.c
@@ -0,0 +1,822 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements the budgeting unit which is responsible for UBIFS space
+ * management.
+ *
+ * Factors such as compression, wasted space at the ends of LEBs, space in other
+ * journal heads, the effect of updates on the index, and so on, make it
+ * impossible to accurately predict the amount of space needed. Consequently
+ * approximations are used.
+ */
+
+#include "ubifs.h"
+#include <linux/writeback.h>
+#include <asm/div64.h>
+
+/*
+ * When pessimistic budget calculations say that there is no enough space,
+ * UBIFS starts writing back dirty inodes and pages, doing garbage collection,
+ * or committing. The below constants define maximum number of times UBIFS
+ * repeats the operations.
+ */
+#define MAX_SHRINK_RETRIES 8
+#define MAX_GC_RETRIES 4
+#define MAX_CMT_RETRIES 2
+#define MAX_NOSPC_RETRIES 1
+
+/*
+ * The below constant defines amount of dirty pages which should be written
+ * back at when trying to shrink the liability.
+ */
+#define NR_TO_WRITE 16
+
+/**
+ * struct retries_info - information about re-tries while making free space.
+ * @prev_liability: previous liability
+ * @shrink_cnt: how many times the liability was shrinked
+ * @shrink_retries: count of liability shrink re-tries (increased when
+ * liability does not shrink)
+ * @try_gc: GC should be tried first
+ * @gc_retries: how many times GC was run
+ * @cmt_retries: how many times commit has been done
+ * @nospc_retries: how many times GC returned %-ENOSPC
+ *
+ * Since we consider budgeting to be the fast-path, and this structure has to
+ * be allocated on stack and zeroed out, we make it smaller using bit-fields.
+ */
+struct retries_info {
+ long long prev_liability;
+ unsigned int shrink_cnt;
+ unsigned int shrink_retries:5;
+ unsigned int try_gc:1;
+ unsigned int gc_retries:4;
+ unsigned int cmt_retries:3;
+ unsigned int nospc_retries:1;
+};
+
+/**
+ * shrink_liability - write-back some dirty pages/inodes.
+ * @c: UBIFS file-system description object
+ * @nr_to_write: how many dirty pages to write-back
+ *
+ * This function shrinks UBIFS liability by means of writing back some amount
+ * of dirty inodes and their pages. Returns the amount of pages which were
+ * written back. The returned value does not include dirty inodes which were
+ * synchronized.
+ *
+ * Note, this function synchronizes even VFS inodes which are locked
+ * (@i_mutex) by the caller of the budgeting function, because write-back does
+ * not touch @i_mutex.
+ */
+static int shrink_liability(struct ubifs_info *c, int nr_to_write)
+{
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_NONE,
+ .range_end = LLONG_MAX,
+ .nr_to_write = nr_to_write,
+ };
+
+ writeback_inodes_sb(c->vfs_sb, &wbc);
+ dbg_budg("%ld pages were written back", nr_to_write - wbc.nr_to_write);
+ return nr_to_write - wbc.nr_to_write;
+}
+
+
+/**
+ * run_gc - run garbage collector.
+ * @c: UBIFS file-system description object
+ *
+ * This function runs garbage collector to make some more free space. Returns
+ * zero if a free LEB has been produced, %-EAGAIN if commit is required, and a
+ * negative error code in case of failure.
+ */
+static int run_gc(struct ubifs_info *c)
+{
+ int err, lnum;
+
+ /* Make some free space by garbage-collecting dirty space */
+ down_read(&c->commit_sem);
+ lnum = ubifs_garbage_collect(c, 1);
+ up_read(&c->commit_sem);
+ if (lnum < 0)
+ return lnum;
+
+ /* GC freed one LEB, return it to lprops */
+ dbg_budg("GC freed LEB %d", lnum);
+ err = ubifs_return_leb(c, lnum);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+/**
+ * make_free_space - make more free space on the file-system.
+ * @c: UBIFS file-system description object
+ * @ri: information about previous invocations of this function
+ *
+ * This function is called when an operation cannot be budgeted because there
+ * is supposedly no free space. But in most cases there is some free space:
+ * o budgeting is pessimistic, so it always budgets more then it is actually
+ * needed, so shrinking the liability is one way to make free space - the
+ * cached data will take less space then it was budgeted for;
+ * o GC may turn some dark space into free space (budgeting treats dark space
+ * as not available);
+ * o commit may free some LEB, i.e., turn freeable LEBs into free LEBs.
+ *
+ * So this function tries to do the above. Returns %-EAGAIN if some free space
+ * was presumably made and the caller has to re-try budgeting the operation.
+ * Returns %-ENOSPC if it couldn't do more free space, and other negative error
+ * codes on failures.
+ */
+static int make_free_space(struct ubifs_info *c, struct retries_info *ri)
+{
+ int err;
+
+ /*
+ * If we have some dirty pages and inodes (liability), try to write
+ * them back unless this was tried too many times without effect
+ * already.
+ */
+ if (ri->shrink_retries < MAX_SHRINK_RETRIES && !ri->try_gc) {
+ long long liability;
+
+ spin_lock(&c->space_lock);
+ liability = c->budg_idx_growth + c->budg_data_growth +
+ c->budg_dd_growth;
+ spin_unlock(&c->space_lock);
+
+ if (ri->prev_liability >= liability) {
+ /* Liability does not shrink, next time try GC then */
+ ri->shrink_retries += 1;
+ if (ri->gc_retries < MAX_GC_RETRIES)
+ ri->try_gc = 1;
+ dbg_budg("liability did not shrink: retries %d of %d",
+ ri->shrink_retries, MAX_SHRINK_RETRIES);
+ }
+
+ dbg_budg("force write-back (count %d)", ri->shrink_cnt);
+ shrink_liability(c, NR_TO_WRITE + ri->shrink_cnt);
+
+ ri->prev_liability = liability;
+ ri->shrink_cnt += 1;
+ return -EAGAIN;
+ }
+
+ /*
+ * Try to run garbage collector unless it was already tried too many
+ * times.
+ */
+ if (ri->gc_retries < MAX_GC_RETRIES) {
+ ri->gc_retries += 1;
+ dbg_budg("run GC, retries %d of %d",
+ ri->gc_retries, MAX_GC_RETRIES);
+
+ ri->try_gc = 0;
+ err = run_gc(c);
+ if (!err)
+ return -EAGAIN;
+
+ if (err == -EAGAIN) {
+ dbg_budg("GC asked to commit");
+ err = ubifs_run_commit(c);
+ if (err)
+ return err;
+ return -EAGAIN;
+ }
+
+ if (err != -ENOSPC)
+ return err;
+
+ /*
+ * GC could not make any progress. If this is the first time,
+ * then it makes sense to try to commit, because it might make
+ * some dirty space.
+ */
+ dbg_budg("GC returned -ENOSPC, retries %d",
+ ri->nospc_retries);
+ if (ri->nospc_retries >= MAX_NOSPC_RETRIES)
+ return err;
+ ri->nospc_retries += 1;
+ }
+
+ /* Neither GC nor write-back helped, try to commit */
+ if (ri->cmt_retries < MAX_CMT_RETRIES) {
+ ri->cmt_retries += 1;
+ dbg_budg("run commit, retries %d of %d",
+ ri->cmt_retries, MAX_CMT_RETRIES);
+ err = ubifs_run_commit(c);
+ if (err)
+ return err;
+ return -EAGAIN;
+ }
+
+ return -ENOSPC;
+}
+
+/**
+ * ubifs_calc_min_idx_lebs - calculate amount of eraseblocks for the index.
+ * @c: UBIFS file-system description object
+ *
+ * This function calculates and returns the number of eraseblocks which should
+ * be kept for index usage.
+ */
+int ubifs_calc_min_idx_lebs(struct ubifs_info *c)
+{
+ int rem;
+ long long idx_size;
+
+ idx_size = c->old_idx_sz + c->budg_idx_growth + c->budg_uncommitted_idx;
+
+ /* And make sure we have twice the index size of space reserved */
+ idx_size <<= 1;
+
+ /*
+ * We do not maintain 'old_idx_size' as 'old_idx_lebs'/'old_idx_bytes'
+ * pair, nor similarly the two variables for the new index size, so we
+ * have to do this costly 64-bit division on fast-path.
+ */
+ rem = do_div(idx_size, c->leb_size - c->max_idx_node_sz);
+ return idx_size + !!rem;
+}
+
+/**
+ * ubifs_calc_available - calculate available FS space.
+ * @c: UBIFS file-system description object
+ *
+ * This function calculates and returns amount of FS space available for use.
+ */
+long long ubifs_calc_available(const struct ubifs_info *c)
+{
+ long long available, subtract_lebs;
+
+ available = c->main_bytes - c->lst.total_used;
+
+ /*
+ * Now 'available' contains theoretically available flash space
+ * assuming there is no index, so we have to subtract the space which
+ * is reserved for the index.
+ */
+ subtract_lebs = c->min_idx_lebs;
+
+ /* Take into account that GC reserves one LEB for its own needs */
+ subtract_lebs += 1;
+
+ /*
+ * The GC journal head LEB is not really accessible. And since
+ * different write types go to different heads, we may count only on
+ * one head's space.
+ */
+ subtract_lebs += c->jhead_cnt - 1;
+
+ /* We also reserve one LEB for deletions, which bypass budgeting */
+ subtract_lebs += 1;
+
+ available -= subtract_lebs * c->leb_size;
+
+ /* Subtract the dead space which is not available for use */
+ available -= c->lst.total_dead;
+
+ /*
+ * Subtract dark space, which might or might not be usable - it depends
+ * on the data which we have on the media and which will be written. If
+ * this is a lot of uncompressed or not-compressible data, the dark
+ * space cannot be used.
+ */
+ available -= c->lst.total_dark;
+
+ return available;
+}
+
+/**
+ * rp_can_write - check whether the user is allowed to write.
+ * @c: UBIFS file-system description object
+ * @avail: available space on FS
+ *
+ * UBIFS has so-called "reserved pool" which is flash space reserved
+ * for the superuser and for uses whose UID/GID is recorded in UBIFS superblock.
+ * This function checks whether current user is allowed to write
+ * to the file-system - it returns %1 if there is plenty of space or the user
+ * is eligible to use the reserved pool and %0 otherwise.
+ */
+static int rp_can_write(struct ubifs_info *c, long long avail)
+{
+ if (avail > c->rp_size || current->fsuid == c->rp_uid ||
+ capable(CAP_SYS_RESOURCE) ||
+ (c->rp_gid != 0 && in_group_p(c->rp_gid)))
+ return 1;
+
+ return 0;
+}
+
+/**
+ * do_budget_space - reserve flash space for index and data growth.
+ * @c: UBIFS file-system description object
+ *
+ * This function makes sure UBIFS has enough free eraseblocks for index growth
+ * and data.
+ *
+ * When budgeting index space, UBIFS reserves twice as more LEBs as the index
+ * would take if it was consolidated and written to the flash. This guarantees
+ * that the "in-the-gaps" commit method always succeeds and UBIFS will always
+ * be able to commit dirty index. So this function basically adds amount of
+ * budgeted index space to the size of the current index, multiplies this by 2,
+ * and makes sure this does not exceed the amount of free eraseblocks.
+ *
+ * Notes about @c->min_idx_lebs and @c->lst.idx_lebs variables:
+ * o @c->lst.idx_lebs is the number of LEBs the index currently uses. It might
+ * be large, because UBIFS does not do any index consolidation as long as
+ * there is free space. IOW, the index may take a lot of LEBs, but the LEBs
+ * will contain a lot of dirt.
+ * o @c->min_idx_lebs is the the index presumably takes. IOW, the index may be
+ * consolidated to take up to @c->min_idx_lebs LEBs.
+ *
+ * This function returns zero in case of success, and %-ENOSPC in case of
+ * failure.
+ */
+static int do_budget_space(struct ubifs_info *c)
+{
+ long long outstanding, available;
+ int lebs, rsvd_idx_lebs, min_idx_lebs;
+
+ /* First budget index space */
+ min_idx_lebs = ubifs_calc_min_idx_lebs(c);
+
+ /* Now 'min_idx_lebs' contains number of LEBs to reserve */
+ if (min_idx_lebs > c->lst.idx_lebs)
+ rsvd_idx_lebs = min_idx_lebs - c->lst.idx_lebs;
+ else
+ rsvd_idx_lebs = 0;
+
+ /*
+ * The number of LEBs that are available to be used by the index is:
+ *
+ * c->lst.empty_lebs + c->freeable_cnt + c->idx_gc_cnt -
+ * c->lst.taken_empty_lebs
+ *
+ * empty_lebs are available because they are empty. freeable_cnt are
+ * available because they contain only free and dirty space and the
+ * index allocation always occurs after wbufs are synch'ed.
+ * idx_gc_cnt are available because they are index LEBs that have been
+ * garbage collected (including trivial GC) and are awaiting the commit
+ * before they can be unmapped - note that the in-the-gaps method will
+ * grab these if it needs them. taken_empty_lebs are empty_lebs that
+ * have already been allocated for some purpose (also includes those
+ * LEBs on the idx_gc list).
+ */
+ lebs = c->lst.empty_lebs + c->freeable_cnt + c->idx_gc_cnt -
+ c->lst.taken_empty_lebs;
+ ubifs_assert(lebs + c->lst.idx_lebs >= c->min_idx_lebs);
+ if (unlikely(rsvd_idx_lebs > lebs)) {
+ dbg_budg("out of indexing space: min_idx_lebs %d (old %d), "
+ "rsvd_idx_lebs %d", min_idx_lebs, c->min_idx_lebs,
+ rsvd_idx_lebs);
+ return -ENOSPC;
+ }
+
+ available = ubifs_calc_available(c);
+ outstanding = c->budg_data_growth + c->budg_dd_growth;
+
+ if (unlikely(available < outstanding)) {
+ dbg_budg("out of data space: available %lld, outstanding %lld",
+ available, outstanding);
+ return -ENOSPC;
+ }
+
+ if (!rp_can_write(c, available - outstanding))
+ return -ENOSPC;
+
+ c->min_idx_lebs = min_idx_lebs;
+ return 0;
+}
+
+/**
+ * calc_idx_growth - calculate approximate index growth from budgeting request.
+ * @c: UBIFS file-system description object
+ * @req: budgeting request
+ *
+ * For now we assume each new node adds one znode. But this is rather poor
+ * approximation, though.
+ */
+static int calc_idx_growth(const struct ubifs_info *c,
+ const struct ubifs_budget_req *req)
+{
+ int znodes;
+
+ znodes = req->new_ino + req->new_page + req->new_dent;
+ return znodes * c->max_idx_node_sz;
+}
+
+/**
+ * calc_data_growth - calculate approximate amount of new data from budgeting
+ * request.
+ * @c: UBIFS file-system description object
+ * @req: budgeting request
+ */
+static int calc_data_growth(const struct ubifs_info *c,
+ const struct ubifs_budget_req *req)
+{
+ int data_growth;
+
+ data_growth = req->new_ino ? c->inode_budget : 0;
+ if (req->new_page)
+ data_growth += c->page_budget;
+ if (req->new_dent)
+ data_growth += c->dent_budget;
+ data_growth += req->new_ino_d;
+
+ return data_growth;
+}
+
+/**
+ * calc_dd_growth - calculate approximate amount of data which makes other data
+ * dirty from budgeting request.
+ * @c: UBIFS file-system description object
+ * @req: budgeting request
+ */
+static int calc_dd_growth(const struct ubifs_info *c,
+ const struct ubifs_budget_req *req)
+{
+ int dd_growth;
+
+ dd_growth = req->dirtied_page ? c->page_budget : 0;
+
+ if (req->dirtied_ino)
+ dd_growth += c->inode_budget << (req->dirtied_ino - 1);
+ if (req->mod_dent)
+ dd_growth += c->dent_budget;
+ dd_growth += req->dirtied_ino_d;
+
+ return dd_growth;
+}
+
+/**
+ * ubifs_budget_space - ensure there is enough space to complete an operation.
+ * @c: UBIFS file-system description object
+ * @req: budget request
+ *
+ * This function allocates budget for an operation. It uses pessimistic
+ * approximation of how much flash space the operation needs. The goal of this
+ * function is to make sure UBIFS always has flash space to flush all dirty
+ * pages, dirty inodes, and dirty znodes (liability). This function may force
+ * commit, garbage-collection or write-back. Returns zero in case of success,
+ * %-ENOSPC if there is no free space and other negative error codes in case of
+ * failures.
+ */
+int ubifs_budget_space(struct ubifs_info *c, struct ubifs_budget_req *req)
+{
+ int uninitialized_var(cmt_retries), uninitialized_var(wb_retries);
+ int err, idx_growth, data_growth, dd_growth;
+ struct retries_info ri;
+
+ memset(&ri, 0, sizeof(struct retries_info));
+ idx_growth = calc_idx_growth(c, req);
+ data_growth = calc_data_growth(c, req);
+ dd_growth = calc_dd_growth(c, req);
+
+again:
+ spin_lock(&c->space_lock);
+ ubifs_assert(c->budg_idx_growth >= 0);
+ ubifs_assert(c->budg_data_growth >= 0);
+ ubifs_assert(c->budg_dd_growth >= 0);
+
+ c->budg_idx_growth += idx_growth;
+ c->budg_data_growth += data_growth;
+ c->budg_dd_growth += dd_growth;
+
+ err = do_budget_space(c);
+ if (unlikely(err)) {
+ /* Restore the old values */
+ c->budg_idx_growth -= idx_growth;
+ c->budg_data_growth -= data_growth;
+ c->budg_dd_growth -= dd_growth;
+ spin_unlock(&c->space_lock);
+
+ goto make_space;
+ }
+
+ req->idx_growth = idx_growth;
+ req->data_growth = data_growth;
+ req->dd_growth = dd_growth;
+ spin_unlock(&c->space_lock);
+
+ return 0;
+
+make_space:
+ err = make_free_space(c, &ri);
+ if (err == -EAGAIN) {
+ dbg_budg("try again");
+ cond_resched();
+ goto again;
+ } else if (err == -ENOSPC)
+ dbg_budg("FS is full, -ENOSPC");
+ else
+ ubifs_err("cannot budget space, error %d", err);
+
+ return err;
+}
+
+/**
+ * ubifs_release_budget - release budgeted free space.
+ * @c: UBIFS file-system description object
+ * @req: budget request
+ *
+ * This function releases the space budgeted by 'ubifs_budget_space()'. Note,
+ * since the index changes (which were budgeted for in @req->idx_growth) will
+ * only be written to the media on commit, this function moves the index budget
+ * from @c->budg_idx_growth to @c->budg_uncommitted_idx. The latter will be
+ * zeroed by the commit operation.
+ */
+void ubifs_release_budget(struct ubifs_info *c, struct ubifs_budget_req *req)
+{
+ if (req->data_growth + req->dd_growth == 0)
+ return;
+
+ if (req->idx_growth == -1)
+ req->idx_growth = calc_idx_growth(c, req);
+
+ spin_lock(&c->space_lock);
+ c->budg_idx_growth -= req->idx_growth;
+ c->budg_uncommitted_idx += req->idx_growth;
+ c->budg_data_growth -= req->data_growth;
+ c->budg_dd_growth -= req->dd_growth;
+ c->min_idx_lebs = ubifs_calc_min_idx_lebs(c);
+
+ ubifs_assert(c->budg_idx_growth >= 0);
+ ubifs_assert(c->budg_data_growth >= 0);
+ ubifs_assert(c->min_idx_lebs < c->main_lebs);
+ spin_unlock(&c->space_lock);
+}
+
+/**
+ * ubifs_convert_page_budget - convert budget of a new page.
+ * @c: UBIFS file-system description object
+ *
+ * This function converts budget which was allocated for a new page of data to
+ * the budget of changing an existing page of data. The latter is not larger
+ * then the former, so this function only does simple re-calculation and does
+ * not involve any write-back.
+ */
+void ubifs_convert_page_budget(struct ubifs_info *c)
+{
+ spin_lock(&c->space_lock);
+ /* Release the index growth reservation */
+ c->budg_idx_growth -= c->max_idx_node_sz;
+ /* Release the data growth reservation */
+ c->budg_data_growth -= c->page_budget;
+ /* Increase the dirty data growth reservation instead */
+ c->budg_dd_growth += c->page_budget;
+ /* And re-calculate the indexing space reservation */
+ c->min_idx_lebs = ubifs_calc_min_idx_lebs(c);
+ spin_unlock(&c->space_lock);
+}
+
+/**
+ * ubifs_budget_inode_op - budget an operation on inode.
+ * @c: UBIFS file-system description object
+ * @inode: VFS inode which will be made dirty by the operation
+ * @req: budget request of the operation
+ *
+ * This function is called to get budget for an operation which changes an
+ * inode. The inode may be in dirty or clean state. The former means there is
+ * no need to allocate the budget as it has already been allocated before. The
+ * latter means that the inode change budget has to be allocated.
+ *
+ * The caller has to pass the inode which is going to be changed. This function
+ * acquires budget the for as described in @req plus the budget for changing
+ * the inode dirty, if needed. Returns zero in case of success, %-ENOSPC if
+ * there is no more flash space, and other negative error codes in case of
+ * failure.
+ *
+ * Note, upon exit, this function leaves the inode locked, and the
+ * 'ubifs_release_ino_dirty()' or 'ubifs_release_ino_clean()' function has to
+ * be called to unlock it.
+ */
+int ubifs_budget_inode_op(struct ubifs_info *c, struct inode *inode,
+ struct ubifs_budget_req *req)
+{
+ struct ubifs_inode *ui = ubifs_inode(inode);
+ int err, old = req->dirtied_ino;
+
+ ubifs_assert(req->dirtied_ino <= 3);
+ ubifs_assert(req->dirtied_ino_d <= UBIFS_MAX_INO_DATA * 3);
+
+again:
+ /*
+ * If the inode is clean, it will be dirtied by this operation and we
+ * have to budget for this.
+ */
+ req->dirtied_ino += !ui->dirty;
+ if (req->dirtied_ino > old)
+ req->dirtied_ino_d = ui->data_len;
+
+ err = ubifs_budget_space(c, req);
+ if (unlikely(err))
+ return err;
+
+ mutex_lock(&ui->budg_mutex);
+
+ if (req->dirtied_ino != old + !ui->dirty) {
+ /* The inode has probably been written back meanwhile */
+ ubifs_release_budget(c, req);
+ mutex_unlock(&ui->budg_mutex);
+ req->dirtied_ino = old;
+ req->dirtied_ino_d -= ui->data_len;
+ goto again;
+ }
+
+ UBIFS_DBG(ui->budgeted = 1);
+ return 0;
+}
+
+/**
+ * ubifs_release_ino_dirty - release budget of a "dirtying" operation.
+ * @c: UBIFS file-system description object
+ * @inode: VFS inode the operation worked on
+ * @req: budget to release
+ *
+ * This function has to be called at the end of VFS operations which acquired
+ * budget via 'ubifs_budget_inode_op()'. It assumes that the inode has been
+ * marked as dirty and will be synchronized later by write-back, so it does not
+ * release the budget of the inode.
+ *
+ * Note, this function also avoids releasing page budgets which are released
+ * separately.
+ */
+void ubifs_release_ino_dirty(struct ubifs_info *c, struct inode *inode,
+ struct ubifs_budget_req *req)
+{
+ ubifs_assert(req->dirtied_ino <= 4);
+ ubifs_assert(req->dirtied_ino_d <= UBIFS_MAX_INO_DATA * 4);
+ ubifs_assert(req->idx_growth >= 0);
+ ubifs_assert(req->data_growth >= 0);
+ ubifs_assert(req->dd_growth >= 0);
+
+ if (req->dirtied_ino) {
+ req->dd_growth -= c->inode_budget;
+ req->dd_growth -= req->dirtied_ino_d;
+ }
+
+ if (req->dirtied_page) {
+ req->dd_growth -= c->page_budget;
+ ubifs_assert(req->new_page == 0);
+ } else if (req->new_page) {
+ req->idx_growth -= c->max_idx_node_sz;
+ req->data_growth -= c->page_budget;
+ ubifs_assert(req->dirtied_page == 0);
+ }
+
+ ubifs_assert(req->dd_growth >= 0);
+ ubifs_release_budget(c, req);
+ mutex_unlock(&ubifs_inode(inode)->budg_mutex);
+}
+
+/**
+ * ubifs_cancel_ino_op - cancel budget of an operation on inode.
+ * @c: UBIFS file-system description object
+ * @inode: VFS inode the operation worked on
+ * @req: budget to release
+ *
+ * This function has to be called if the operation failed and whole budget has
+ * to be released, including the budget for inode which would had been
+ * dirtied. It is important not to mark the inode dirty before calling this
+ * function.
+ */
+void ubifs_cancel_ino_op(struct ubifs_info *c, struct inode *inode,
+ struct ubifs_budget_req *req)
+{
+ ubifs_assert(req->dirtied_ino <= 4);
+ ubifs_assert(req->dirtied_ino_d <= UBIFS_MAX_INO_DATA * 4);
+ ubifs_assert(req->idx_growth >= 0);
+ ubifs_assert(req->data_growth >= 0);
+ ubifs_assert(req->dd_growth >= 0);
+
+ ubifs_release_budget(c, req);
+ mutex_unlock(&ubifs_inode(inode)->budg_mutex);
+}
+
+/**
+ * ubifs_release_ino_clean - release budget of a "cleaning" operation.
+ * @c: UBIFS file-system description object
+ * @inode: VFS inode the operation worked on
+ * @req: budget to release
+ *
+ * This function has to be called at the end of VFS operations which acquired
+ * budget via 'ubifs_budget_inode_op()'. It assumed the operation synchronized
+ * the inode, so it marks the inode clean, unlocks it and releases whole budget.
+ *
+ * Note, this function also avoids releasing page budgets which are released
+ * separately.
+ */
+void ubifs_release_ino_clean(struct ubifs_info *c, struct inode *inode,
+ struct ubifs_budget_req *req)
+{
+ struct ubifs_inode *ui = ubifs_inode(inode);
+
+ ubifs_assert(req->dirtied_ino <= 4);
+ ubifs_assert(req->dirtied_ino_d <= UBIFS_MAX_INO_DATA * 4);
+ ubifs_assert(req->idx_growth >= 0);
+ ubifs_assert(req->data_growth >= 0);
+ ubifs_assert(req->dd_growth >= 0);
+
+ ubifs_assert(!req->dirtied_page);
+ ubifs_assert(!req->new_page);
+ UBIFS_DBG(ui->budgeted = 0);
+
+ ubifs_release_budget(c, req);
+ if (ui->dirty) {
+ ui->dirty = 0;
+ /*
+ * Note, VFS still treats the inode as dirty and
+ * 'ubifs_write_inode()' will be called, but it'll do nothing
+ * because @ui->dirty is %0.
+ */
+ atomic_long_dec(&c->dirty_ino_cnt);
+ }
+ mutex_unlock(&ubifs_inode(inode)->budg_mutex);
+}
+
+/**
+ * ubifs_release_new_page_budget - release budget of a new page.
+ * @c: UBIFS file-system description object
+ *
+ * This is a helper function which releases budget corresponding to the budget
+ * of one new page of data.
+ */
+void ubifs_release_new_page_budget(struct ubifs_info *c)
+{
+ struct ubifs_budget_req req = { .new_page = 1,
+ .idx_growth = -1,
+ .data_growth = c->page_budget};
+
+ ubifs_release_budget(c, &req);
+}
+
+/**
+ * ubifs_budg_get_free_space - return amount of free space.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns amount of free space on the file-system.
+ */
+long long ubifs_budg_get_free_space(struct ubifs_info *c)
+{
+ int min_idx_lebs, rsvd_idx_lebs;
+ long long available, outstanding, free;
+
+ /* Do exactly the same calculations as in 'do_budget_space()' */
+ spin_lock(&c->space_lock);
+ min_idx_lebs = ubifs_calc_min_idx_lebs(c);
+
+ if (min_idx_lebs > c->lst.idx_lebs)
+ rsvd_idx_lebs = min_idx_lebs - c->lst.idx_lebs;
+ else
+ rsvd_idx_lebs = 0;
+
+ if (rsvd_idx_lebs > c->lst.empty_lebs + c->freeable_cnt + c->idx_gc_cnt
+ - c->lst.taken_empty_lebs) {
+ spin_unlock(&c->space_lock);
+ return 0;
+ }
+
+ available = ubifs_calc_available(c);
+ outstanding = c->budg_data_growth + c->budg_dd_growth;
+ spin_unlock(&c->space_lock);
+
+ if (available > outstanding) {
+ int divisor, factor;
+
+ free = available - outstanding;
+ /*
+ * Assume free space is made up of uncompressed data nodes and
+ * full index nodes (one per data node, doubled because we
+ * always allow enough space to write the index twice).
+ */
+ divisor = UBIFS_MAX_DATA_NODE_SZ + (c->max_idx_node_sz << 1);
+ factor = UBIFS_MAX_DATA_NODE_SZ - UBIFS_DATA_NODE_SZ;
+ do_div(free, divisor);
+ free *= factor;
+ } else
+ free = 0;
+ return free;
+}
--
1.5.4.1

2008-03-27 13:09:13

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 10/26] UBIFS: add the journal

All the new data first goes to the journal and sits there until it
gets committed. The journal contents does not have corresponding
on-flash indexing information, so the journal is like a small JFFS2
file-system. Once the journal is committed, the indexing information
is written to the flash media.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/journal.c | 1230 ++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ubifs/log.c | 769 ++++++++++++++++++++++++++++++++
2 files changed, 1999 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
new file mode 100644
index 0000000..e7c7aac
--- /dev/null
+++ b/fs/ubifs/journal.c
@@ -0,0 +1,1230 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file implements UBIFS journal.
+ *
+ * The journal consists of 2 parts - the log and bud LEBs. The log has fixed
+ * length and position, while a bud logical eraseblock is any LEB in the main
+ * area. Buds contain file system data - data nodes, inode nodes, etc. The log
+ * contains only references to buds and some other stuff like commit
+ * start node. The idea is that when we commit the journal, we do
+ * not copy the data, the buds just become indexed. Since after the commit the
+ * nodes in bud eraseblocks become leaf nodes of the file system index tree, we
+ * use term "bud". Analogy is obvious, bud eraseblocks contain nodes which will
+ * become leafs in the future.
+ *
+ * The journal is multi-headed because we want to write data to the journal as
+ * optimally as possible. It is nice to have nodes belonging to the same inode
+ * in one LEB, so we may write data owned by different inodes to different
+ * journal heads, although at present only one data head is used.
+ *
+ * For recovery reasons, the base head contains all inode nodes, all directory
+ * entry nodes and all truncate nodes. This means that the other heads contain
+ * only data nodes.
+ *
+ * Bud LEBs may be half-indexed. For example, if the bud was not full at the
+ * time of commit, the bud is retained to continue to be used in the journal,
+ * even though the "front" of the LEB is now indexed. In that case, the log
+ * reference contains the offset where the bud starts for the purposes of the
+ * journal.
+ *
+ * The journal size has to be limited, because the larger is the journal, the
+ * longer it takes to mount UBIFS (scanning the journal) and the more memory it
+ * takes (indexing in the TNC).
+ *
+ * Note, all the journal write operations like 'ubifs_jrn_update()' here, which
+ * write multiple UBIFS nodes to the journal at one go, are atomic with respect
+ * to unclean reboots. Should the unclean reboot happen, the recovery code drops
+ * all the nodes.
+ */
+
+#include "ubifs.h"
+
+/**
+ * reserve_space - reserve space in the journal.
+ * @c: UBIFS file-system description object
+ * @jhead: journal head number
+ * @len: node length
+ *
+ * This function reserves space in journal head @head. If the reservation
+ * succeeded, the journal head stays locked and later has to be unlocked using
+ * 'release_head()'. 'write_node()' and 'write_head()' functions also unlock
+ * it. Returns zero in case of success, %-EAGAIN if commit has to be done, and
+ * other negative error codes in case of other failures.
+ */
+static int reserve_space(struct ubifs_info *c, int jhead, int len)
+{
+ int err = 0, err1, retries = 0, avail, lnum, offs, free, squeeze;
+ struct ubifs_wbuf *wbuf = &c->jheads[jhead].wbuf;
+
+ /*
+ * Typically, the base head has smaller nodes written to it, so it is
+ * better to try to allocate space at the ends of eraseblocks. This is
+ * what the squeeze parameter does.
+ */
+ squeeze = (jhead == BASEHD);
+again:
+ mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);
+ avail = c->leb_size - wbuf->offs - wbuf->used;
+
+ if (wbuf->lnum != -1 && avail >= len)
+ return 0;
+
+ /*
+ * Write buffer wasn't seek'ed or there is no enough space - look for an
+ * LEB with some empty space.
+ */
+ lnum = ubifs_find_free_space(c, len, &free, squeeze);
+ if (lnum >= 0) {
+ /* Found an LEB, add it to the journal head */
+ offs = c->leb_size - free;
+ err = ubifs_add_bud_to_log(c, jhead, lnum, offs);
+ if (err)
+ goto out_return;
+ /* A new bud was successfully allocated and added to the log */
+ goto out;
+ }
+
+ err = lnum;
+ if (err != -ENOSPC)
+ goto out_unlock;
+
+ /*
+ * No free space, we have to run garbage collector to make
+ * some. But the write-buffer mutex has to be unlocked because
+ * GC have to sync write buffers, which may lead a deadlock.
+ */
+ dbg_jrn("no free space jhead %d, run GC", jhead);
+ mutex_unlock(&wbuf->io_mutex);
+
+ lnum = ubifs_garbage_collect(c, 0);
+ if (lnum < 0) {
+ err = lnum;
+ if (err != -ENOSPC)
+ return err;
+
+ /*
+ * GC could not make a free LEB. But someone else may
+ * have allocated new bud for this journal head,
+ * because we dropped the 'io_mutex', so try once
+ * again.
+ */
+ dbg_jrn("GC couldn't make a free LEB for jhead %d", jhead);
+ if (retries++ < 2) {
+ dbg_jrn("retry (%d)", retries);
+ goto again;
+ }
+
+ dbg_jrn("return -ENOSPC");
+ return err;
+ }
+
+ mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);
+ dbg_jrn("got LEB %d for jhead %d", lnum, jhead);
+ avail = c->leb_size - wbuf->offs - wbuf->used;
+
+ if (wbuf->lnum != -1 && avail >= len) {
+ /*
+ * Someone else has switched the journal head and we have
+ * enough space now. This happens when more then one process is
+ * trying to write to the same journal head at the same time.
+ */
+ dbg_jrn("return LEB %d back, already have LEB %d:%d",
+ lnum, wbuf->lnum, wbuf->offs + wbuf->used);
+ err = ubifs_return_leb(c, lnum);
+ if (err)
+ goto out_unlock;
+ return 0;
+ }
+
+ err = ubifs_add_bud_to_log(c, jhead, lnum, 0);
+ if (err)
+ goto out_return;
+ offs = 0;
+
+out:
+ err = ubifs_wbuf_seek_nolock(wbuf, lnum, offs, UBI_SHORTTERM);
+ if (err)
+ goto out_unlock;
+
+ return 0;
+
+out_unlock:
+ mutex_unlock(&wbuf->io_mutex);
+ return err;
+
+out_return:
+ /* An error occurred and the LEB has to be returned to lprops */
+ ubifs_assert(err < 0);
+ err1 = ubifs_return_leb(c, lnum);
+ if (err1 && err == -EAGAIN)
+ /*
+ * Return original error code 'err' only if it is not
+ * '-EAGAIN', which is not really an error. Otherwise, return
+ * the error code of 'ubifs_return_leb()'.
+ */
+ err = err1;
+ mutex_unlock(&wbuf->io_mutex);
+ return err;
+}
+
+/**
+ * write_node - write node to a journal head.
+ * @c: UBIFS file-system description object
+ * @jhead: journal head
+ * @node: node to write
+ * @len: node length
+ * @lnum: LEB number written is returned here
+ * @offs: offset written is returned here
+ *
+ * This function writes a node to reserved space of journal head @jhead.
+ * Returns zero in case of success and a negative error code in case of
+ * failure.
+ */
+static int write_node(struct ubifs_info *c, int jhead, void *node, int len,
+ int *lnum, int *offs)
+{
+ struct ubifs_wbuf *wbuf = &c->jheads[jhead].wbuf;
+
+ ubifs_assert(jhead != GCHD);
+
+ *lnum = c->jheads[jhead].wbuf.lnum;
+ *offs = c->jheads[jhead].wbuf.offs + c->jheads[jhead].wbuf.used;
+
+ dbg_jrn("jhead %d, LEB %d:%d, len %d", jhead, *lnum, *offs, len);
+ ubifs_prepare_node(c, node, len, 0);
+
+ return ubifs_wbuf_write_nolock(wbuf, node, len);
+}
+
+/**
+ * write_head - write data to a journal head.
+ * @c: UBIFS file-system description object
+ * @jhead: journal head
+ * @buf: buffer to write
+ * @len: length to write
+ * @lnum: LEB number written is returned here
+ * @offs: offset written is returned here
+ * @sync: non-zero if the write-buffer has to by synchronized
+ *
+ * This function is the same as 'write_node()' but it does not assume the
+ * buffer it is writing is a node, so it does not prepare it (which means
+ * initializing common header and calculating CRC).
+ */
+static int write_head(struct ubifs_info *c, int jhead, void *buf, int len,
+ int *lnum, int *offs, int sync)
+{
+ int err;
+ struct ubifs_wbuf *wbuf = &c->jheads[jhead].wbuf;
+
+ ubifs_assert(jhead != GCHD);
+
+ *lnum = c->jheads[jhead].wbuf.lnum;
+ *offs = c->jheads[jhead].wbuf.offs + c->jheads[jhead].wbuf.used;
+ dbg_jrn("jhead %d, LEB %d:%d, len %d", jhead, *lnum, *offs, len);
+
+ err = ubifs_wbuf_write_nolock(wbuf, buf, len);
+ if (err)
+ return err;
+ if (sync)
+ err = ubifs_wbuf_sync_nolock(wbuf);
+ return err;
+}
+
+/**
+ * make_reservation - reserve journal space.
+ * @c: UBIFS file-system description object
+ * @jhead: journal head
+ * @len: how many bytes to reserve
+ *
+ * This function makes space reservation in journal head @jhead. The function
+ * takes the commit lock and locks the journal head, and the caller has to
+ * unlock the head and finish the reservation with 'finish_reservation()'.
+ * Returns zero in case of success and a negative error code in case of
+ * failure.
+ *
+ * Note, the journal head may be unlocked as soon as the data is written, while
+ * the commit lock has to be released after the data has been added to the
+ * TNC.
+ */
+static int make_reservation(struct ubifs_info *c, int jhead, int len)
+{
+ int err, cmt_retries = 0, nospc_retries = 0;
+
+ ubifs_assert(len <= c->dark_wm);
+
+again:
+ down_read(&c->commit_sem);
+ err = reserve_space(c, jhead, len);
+ if (!err)
+ return 0;
+ up_read(&c->commit_sem);
+
+ if (err == -ENOSPC) {
+ /*
+ * GC could not make any progress. We should try to commit
+ * once because it could make some dirty space and GC would
+ * make progress, so make the error -EAGAIN so that the below
+ * will commit and re-try.
+ */
+ if (nospc_retries++ < 2) {
+ dbg_jrn("no space, retry");
+ err = -EAGAIN;
+ }
+
+ /*
+ * This means that the budgeting is incorrect. We always have
+ * to be able to write to the media, because all operations are
+ * budgeted. Deletions are not budgeted, though, but we reserve
+ * an extra LEB for them.
+ */
+ }
+
+ if (err != -EAGAIN)
+ goto out;
+
+ /*
+ * -EAGAIN means that the journal is full or too large, or the above
+ * code wants to do one commit. Do this and re-try.
+ */
+ if (cmt_retries > 128) {
+ /*
+ * This should not happen unless the journal size limitations
+ * are too tough.
+ */
+ ubifs_err("stuck in space allocation");
+ err = -ENOSPC;
+ goto out;
+ } else if (cmt_retries > 32)
+ ubifs_warn("too many space allocation re-tries (%d)",
+ cmt_retries);
+
+ dbg_jrn("-EAGAIN, commit and retry (retried %d times)",
+ cmt_retries);
+ cmt_retries += 1;
+
+ err = ubifs_run_commit(c);
+ if (err)
+ return err;
+ goto again;
+
+out:
+ ubifs_err("cannot reserve %d bytes in jhead %d, error %d",
+ len, jhead, err);
+ if (err == -ENOSPC) {
+ /* This are some budgeting problems, print useful information */
+ down_write(&c->commit_sem);
+ spin_lock(&c->space_lock);
+ dbg_dump_stack();
+ dbg_dump_budg(c);
+ spin_unlock(&c->space_lock);
+ dbg_dump_lprops(c);
+ cmt_retries = dbg_check_lprops(c);
+ up_write(&c->commit_sem);
+ }
+
+ return err;
+}
+
+/**
+ * release_head - release a journal head.
+ * @c: UBIFS file-system description object
+ * @jhead: journal head
+ *
+ * This function releases journal head @jhead which was locked by
+ * the 'make_reservation()' function. It has to be called after each successful
+ * 'make_reservation()' invocation.
+ */
+static inline void release_head(struct ubifs_info *c, int jhead)
+{
+ mutex_unlock(&c->jheads[jhead].wbuf.io_mutex);
+}
+
+/**
+ * finish_reservation - finish a reservation.
+ * @c: UBIFS file-system description object
+ *
+ * This function finishes journal space reservation. It must be called after
+ * 'make_reservation()'.
+ */
+static void finish_reservation(struct ubifs_info *c)
+{
+ up_read(&c->commit_sem);
+}
+
+/**
+ * get_dent_type - translate VFS inode mode to UBIFS directory entry type.
+ * @mode: inode mode
+ */
+static int get_dent_type(int mode)
+{
+ switch (mode & S_IFMT) {
+ case S_IFREG:
+ return UBIFS_ITYPE_REG;
+ case S_IFDIR:
+ return UBIFS_ITYPE_DIR;
+ case S_IFLNK:
+ return UBIFS_ITYPE_LNK;
+ case S_IFBLK:
+ return UBIFS_ITYPE_BLK;
+ case S_IFCHR:
+ return UBIFS_ITYPE_CHR;
+ case S_IFIFO:
+ return UBIFS_ITYPE_FIFO;
+ case S_IFSOCK:
+ return UBIFS_ITYPE_SOCK;
+ default:
+ BUG();
+ }
+ return 0;
+}
+
+/**
+ * pack_inode - pack an inode node.
+ * @c: UBIFS file-system description object
+ * @ino: buffer in which to pack inode node
+ * @inode: inode to pack
+ * @last: indicates the last node of the group
+ * @last_reference: non-zero if this is a deletion inode
+ */
+static void pack_inode(struct ubifs_info *c, struct ubifs_ino_node *ino,
+ const struct inode *inode, int last, int last_reference)
+{
+ int data_len = 0;
+ struct ubifs_inode *ui = ubifs_inode(inode);
+
+ ino->ch.node_type = UBIFS_INO_NODE;
+ ino_key_init_flash(c, &ino->key, inode->i_ino);
+ ino->creat_sqnum = cpu_to_le64(ui->creat_sqnum);
+ ino->size = cpu_to_le64(i_size_read(inode));
+ ino->nlink = cpu_to_le32(inode->i_nlink);
+ ino->atime = cpu_to_le32(inode->i_atime.tv_sec);
+ ino->ctime = cpu_to_le32(inode->i_ctime.tv_sec);
+ ino->mtime = cpu_to_le32(inode->i_mtime.tv_sec);
+ ino->uid = cpu_to_le32(inode->i_uid);
+ ino->gid = cpu_to_le32(inode->i_gid);
+ ino->mode = cpu_to_le32(inode->i_mode);
+ ino->flags = cpu_to_le32(ui->flags);
+ ino->compr_type = cpu_to_le16(ui->compr_type);
+ ino->xattr_cnt = cpu_to_le32(ui->xattr_cnt);
+ ino->xattr_size = cpu_to_le64(ui->xattr_size);
+ ino->xattr_msize = cpu_to_le64(ui->xattr_msize);
+ ino->xattr_names = cpu_to_le32(ui->xattr_names);
+ ino->data_len = cpu_to_le32(ui->data_len);
+
+ /*
+ * Drop the attached data if this is a deletion inode, the data is not
+ * needed anymore.
+ */
+ if (!last_reference) {
+ memcpy(ino->data, ui->data, ui->data_len);
+ data_len = ui->data_len;
+ }
+
+ ubifs_prep_grp_node(c, ino, UBIFS_INO_NODE_SZ + data_len, last);
+}
+
+/**
+ * ubifs_jrn_update - update inode.
+ * @c: UBIFS file-system description object
+ * @dir: parent inode or host inode in case of extended attributes
+ * @nm: directory entry name
+ * @inode: inode
+ * @deletion: indicates a directory entry deletion i.e unlink or rmdir
+ * @sync: non-zero if the write-buffer has to be synchronized
+ * @xent: non-zero if the directory entry is an extended attribute entry
+ *
+ * This function updates an inode by writing a directory entry (or extended
+ * attribute entry), the inode itself, and the parent directory inode (or the
+ * host inode) to the journal.
+ *
+ * The function writes the host inode @dir last, which is important in case of
+ * extended attributes. Indeed, then we guarantee that if the host inode gets
+ * synchronized, and the write-buffer it sits in gets flushed, the extended
+ * attribute inode gets flushed too. And this is exactly what the user expects -
+ * synchronizing the host inode synchronizes its extended attributes.
+ * Similarly, this guarantees that if @dir is synchronized, its directory entry
+ * corresponding to @nm gets synchronized too.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_jrn_update(struct ubifs_info *c, const struct inode *dir,
+ const struct qstr *nm, const struct inode *inode,
+ int deletion, int sync, int xent)
+{
+ int err, dlen, ilen, len, lnum, ino_offs, dent_offs;
+ int aligned_dlen, aligned_ilen;
+ int last_reference = !!(deletion && inode->i_nlink == 0);
+ struct ubifs_dent_node *dent;
+ struct ubifs_ino_node *ino;
+ union ubifs_key dent_key, ino_key;
+
+ dbg_jrn("ino %lu, dent '%.*s', data len %d in dir ino %lu",
+ inode->i_ino, nm->len, nm->name, ubifs_inode(inode)->data_len,
+ dir->i_ino);
+ ubifs_assert(ubifs_inode(dir)->data_len == 0);
+
+ dlen = UBIFS_DENT_NODE_SZ + nm->len + 1;
+ ilen = UBIFS_INO_NODE_SZ;
+
+ /*
+ * If the last reference to the inode is being deleted, then there is no
+ * need to attach and write inode data, it is being deleted anyway.
+ */
+ if (!last_reference)
+ ilen += ubifs_inode(inode)->data_len;
+
+ aligned_dlen = ALIGN(dlen, 8);
+ aligned_ilen = ALIGN(ilen, 8);
+
+ len = aligned_dlen + aligned_ilen + UBIFS_INO_NODE_SZ;
+
+ dent = kmalloc(len, GFP_KERNEL);
+ if (!dent)
+ return -ENOMEM;
+
+ if (!xent) {
+ dent->ch.node_type = UBIFS_DENT_NODE;
+ dent_key_init(c, &dent_key, dir->i_ino, nm);
+ } else {
+ dent->ch.node_type = UBIFS_XENT_NODE;
+ xent_key_init(c, &dent_key, dir->i_ino, nm);
+ }
+
+ key_write(c, &dent_key, dent->key);
+ dent->inum = deletion ? 0 : cpu_to_le64(inode->i_ino);
+ dent->padding = 0;
+ dent->type = get_dent_type(inode->i_mode);
+ dent->nlen = cpu_to_le16(nm->len);
+ memcpy(dent->name, nm->name, nm->len);
+ dent->name[nm->len] = '\0';
+ ubifs_prep_grp_node(c, dent, dlen, 0);
+
+ ino = (void *)dent + aligned_dlen;
+ pack_inode(c, ino, inode, 0, last_reference);
+
+ ino = (void *)ino + aligned_ilen;
+ pack_inode(c, ino, dir, 1, 0);
+
+ err = make_reservation(c, BASEHD, len);
+ if (err)
+ goto out_free;
+
+ if (last_reference) {
+ err = ubifs_add_orphan(c, inode->i_ino);
+ if (err) {
+ release_head(c, BASEHD);
+ goto out_finish;
+ }
+ }
+
+ err = write_head(c, BASEHD, dent, len, &lnum, &dent_offs, sync);
+ if (!sync && !err) {
+ struct ubifs_wbuf *wbuf = &c->jheads[BASEHD].wbuf;
+
+ ubifs_wbuf_add_ino_nolock(wbuf, inode->i_ino);
+ ubifs_wbuf_add_ino_nolock(wbuf, dir->i_ino);
+ }
+ release_head(c, BASEHD);
+ kfree(dent);
+ if (err)
+ goto out_ro;
+
+ if (deletion) {
+ err = ubifs_tnc_remove_nm(c, &dent_key, nm);
+ if (err)
+ goto out_ro;
+ err = ubifs_add_dirt(c, lnum, dlen);
+ } else
+ err = ubifs_tnc_add_nm(c, &dent_key, lnum, dent_offs, dlen, nm);
+ if (err)
+ goto out_ro;
+
+ /*
+ * Note, we do not remove the inode from TNC even if the last reference
+ * to it has just been deleted, because the inode may still be opened.
+ * Instead, the inode has been added to orphan lists and the orphan
+ * subsystem will take further care about it.
+ */
+ ino_key_init(c, &ino_key, inode->i_ino);
+ ino_offs = dent_offs + aligned_dlen;
+ err = ubifs_tnc_add(c, &ino_key, lnum, ino_offs, ilen);
+ if (err)
+ goto out_ro;
+
+ ino_key_init(c, &ino_key, dir->i_ino);
+ ino_offs += aligned_ilen;
+ err = ubifs_tnc_add(c, &ino_key, lnum, ino_offs, UBIFS_INO_NODE_SZ);
+ if (err)
+ goto out_ro;
+
+ finish_reservation(c);
+ return 0;
+
+out_finish:
+ finish_reservation(c);
+out_free:
+ kfree(dent);
+ return err;
+
+out_ro:
+ ubifs_ro_mode(c);
+ if (last_reference)
+ ubifs_delete_orphan(c, inode->i_ino);
+ finish_reservation(c);
+ return err;
+}
+
+/**
+ * ubifs_jrn_write_data - write a data node to the journal.
+ * @c: UBIFS file-system description object
+ * @inode: inode the data node belongs to
+ * @key: node key
+ * @buf: buffer to write
+ * @len: data length (must not exceed %UBIFS_BLOCK_SIZE)
+ *
+ * This function writes a data node to the journal. Returns %0 if the data node
+ * was successfully written, and a negative error code in case of failure.
+ */
+int ubifs_jrn_write_data(struct ubifs_info *c, const struct inode *inode,
+ const union ubifs_key *key, const void *buf, int len)
+{
+ int err, lnum, offs, compr_type, out_len;
+ int dlen = UBIFS_DATA_NODE_SZ + len * WORST_COMPR_FACTOR;
+ const struct ubifs_inode *ui = ubifs_inode(inode);
+ struct ubifs_data_node *data;
+
+ dbg_jrn_key(c, key, "ino %lu, blk %u, len %d, key ",
+ key_ino(c, key), key_block(c, key), len);
+ ubifs_assert(len <= UBIFS_BLOCK_SIZE);
+
+ data = kmalloc(dlen, GFP_NOFS);
+ if (!data)
+ return -ENOMEM;
+
+ data->ch.node_type = UBIFS_DATA_NODE;
+ key_write(c, key, &data->key);
+ data->size = cpu_to_le32(len);
+
+ if (!(ui->flags && UBIFS_COMPR_FL))
+ /* Compression is disabled for this inode */
+ compr_type = UBIFS_COMPR_NONE;
+ else
+ compr_type = ui->compr_type;
+
+ out_len = dlen - UBIFS_DATA_NODE_SZ;
+ ubifs_compress(buf, len, &data->data, &out_len, &compr_type);
+ ubifs_assert(out_len <= UBIFS_BLOCK_SIZE);
+
+ dlen = UBIFS_DATA_NODE_SZ + out_len;
+ data->compr_type = cpu_to_le16(compr_type);
+
+ err = make_reservation(c, DATAHD, dlen);
+ if (err)
+ goto out_free;
+
+ err = write_node(c, DATAHD, data, dlen, &lnum, &offs);
+ if (!err)
+ ubifs_wbuf_add_ino_nolock(&c->jheads[DATAHD].wbuf,
+ key_ino(c, key));
+ release_head(c, DATAHD);
+ if (err)
+ goto out_ro;
+
+ err = ubifs_tnc_add(c, key, lnum, offs, dlen);
+ if (err)
+ goto out_ro;
+
+ finish_reservation(c);
+ kfree(data);
+ return 0;
+
+out_ro:
+ ubifs_ro_mode(c);
+ finish_reservation(c);
+out_free:
+ kfree(data);
+ return err;
+}
+
+/**
+ * ubifs_jrn_write_inode - flush inode to the journal.
+ * @c: UBIFS file-system description object
+ * @inode: inode to flush
+ * @last_reference: inode has been deleted
+ * @sync: non-zero if the write-buffer has to be synchronized
+ *
+ * This function writes inode @inode to the journal (to the base head). Returns
+ * zero in case of success and a negative error code in case of failure.
+ */
+int ubifs_jrn_write_inode(struct ubifs_info *c, const struct inode *inode,
+ int last_reference, int sync)
+{
+ int err, len, lnum, offs;
+ struct ubifs_ino_node *ino;
+ struct ubifs_inode *ui = ubifs_inode(inode);
+
+ dbg_jrn("ino %lu%s", inode->i_ino,
+ last_reference ? " (last reference)" : "");
+ if (last_reference)
+ ubifs_assert(inode->i_nlink == 0);
+
+ /* If the inode is deleted, do not write the attached data */
+ len = UBIFS_INO_NODE_SZ;
+ if (!last_reference)
+ len += ui->data_len;
+ ino = kmalloc(len, GFP_NOFS);
+ if (!ino)
+ return -ENOMEM;
+ pack_inode(c, ino, inode, 1, last_reference);
+
+ err = make_reservation(c, BASEHD, len);
+ if (err)
+ goto out_free;
+
+ err = write_head(c, BASEHD, ino, len, &lnum, &offs, sync);
+ if (!sync && !err)
+ ubifs_wbuf_add_ino_nolock(&c->jheads[BASEHD].wbuf,
+ inode->i_ino);
+ release_head(c, BASEHD);
+ if (err)
+ goto out_ro;
+
+ if (last_reference) {
+ err = ubifs_tnc_remove_ino(c, inode->i_ino);
+ if (err)
+ goto out_ro;
+ ubifs_delete_orphan(c, inode->i_ino);
+ err = ubifs_add_dirt(c, lnum, len);
+ } else {
+ union ubifs_key key;
+
+ ino_key_init(c, &key, inode->i_ino);
+ err = ubifs_tnc_add(c, &key, lnum, offs, len);
+ }
+ if (err)
+ goto out_ro;
+
+ finish_reservation(c);
+ kfree(ino);
+ return 0;
+
+out_ro:
+ ubifs_ro_mode(c);
+ finish_reservation(c);
+out_free:
+ kfree(ino);
+ return err;
+}
+
+/**
+ * ubifs_jrn_rename - rename a directory entry.
+ * @c: UBIFS file-system description object
+ * @old_dir: parent inode of directory entry to rename
+ * @old_dentry: directory entry to rename
+ * @new_dir: parent inode of directory entry to rename
+ * @new_dentry: new directory entry (or directory entry to replace)
+ * @sync: non-zero if the write-buffer has to be synchronized
+ *
+ * Returns zero in case of success and a negative error code in case of failure.
+ */
+int ubifs_jrn_rename(struct ubifs_info *c, const struct inode *old_dir,
+ const struct dentry *old_dentry,
+ const struct inode *new_dir,
+ const struct dentry *new_dentry, int sync)
+{
+ const struct inode *old_inode = old_dentry->d_inode;
+ const struct inode *new_inode = new_dentry->d_inode;
+ int err, dlen1, dlen2, ilen, lnum, offs, len;
+ int aligned_dlen1, aligned_dlen2, plen = UBIFS_INO_NODE_SZ;
+ int last_reference = !!(new_inode && new_inode->i_nlink == 0);
+ struct ubifs_dent_node *dent, *dent2;
+ void *p;
+ union ubifs_key key;
+
+ dbg_jrn("dent '%.*s' in dir ino %lu to dent '%.*s' in dir ino %lu",
+ old_dentry->d_name.len, old_dentry->d_name.name,
+ old_dir->i_ino, new_dentry->d_name.len,
+ new_dentry->d_name.name, new_dir->i_ino);
+
+ ubifs_assert(ubifs_inode(old_dir)->data_len == 0);
+ ubifs_assert(ubifs_inode(new_dir)->data_len == 0);
+
+ dlen1 = UBIFS_DENT_NODE_SZ + new_dentry->d_name.len + 1;
+ dlen2 = UBIFS_DENT_NODE_SZ + old_dentry->d_name.len + 1;
+ if (new_inode) {
+ ilen = UBIFS_INO_NODE_SZ;
+ if (!last_reference)
+ ilen += ubifs_inode(new_inode)->data_len;
+ } else
+ ilen = 0;
+
+ aligned_dlen1 = ALIGN(dlen1, 8);
+ aligned_dlen2 = ALIGN(dlen2, 8);
+
+ len = aligned_dlen1 + aligned_dlen2 + ALIGN(ilen, 8) + ALIGN(plen, 8);
+ if (old_dir != new_dir)
+ len += plen;
+
+ dent = kmalloc(len, GFP_NOFS);
+ if (!dent)
+ return -ENOMEM;
+
+ /* Make new dent */
+ dent->ch.node_type = UBIFS_DENT_NODE;
+ dent_key_init_flash(c, &dent->key, new_dir->i_ino, &new_dentry->d_name);
+ dent->inum = cpu_to_le64(old_inode->i_ino);
+ dent->padding = 0;
+ dent->type = get_dent_type(old_inode->i_mode);
+ dent->nlen = cpu_to_le16(new_dentry->d_name.len);
+ memcpy(dent->name, new_dentry->d_name.name, new_dentry->d_name.len);
+ dent->name[new_dentry->d_name.len] = '\0';
+ ubifs_prep_grp_node(c, dent, dlen1, 0);
+
+ dent2 = (void *)dent + aligned_dlen1;
+
+ /* Make deletion dent */
+ dent2->ch.node_type = UBIFS_DENT_NODE;
+ dent_key_init_flash(c, &dent2->key, old_dir->i_ino,
+ &old_dentry->d_name);
+ dent2->inum = cpu_to_le64(0);
+ dent2->padding = 0;
+ dent2->type = DT_UNKNOWN;
+ dent2->nlen = cpu_to_le16(old_dentry->d_name.len);
+ memcpy(dent2->name, old_dentry->d_name.name, old_dentry->d_name.len);
+ dent2->name[old_dentry->d_name.len] = '\0';
+ ubifs_prep_grp_node(c, dent2, dlen2, 0);
+
+ p = (void *)dent2 + aligned_dlen2;
+ if (new_inode) {
+ pack_inode(c, p, new_inode, 0, last_reference);
+ p += ALIGN(ilen, 8);
+ }
+
+ if (old_dir == new_dir)
+ pack_inode(c, p, old_dir, 1, 0);
+ else {
+ pack_inode(c, p, old_dir, 0, 0);
+ p += ALIGN(plen, 8);
+ pack_inode(c, p, new_dir, 1, 0);
+ }
+
+ err = make_reservation(c, BASEHD, len);
+ if (err)
+ goto out_free;
+
+ if (last_reference) {
+ err = ubifs_add_orphan(c, new_inode->i_ino);
+ if (err) {
+ release_head(c, BASEHD);
+ goto out_finish;
+ }
+ }
+
+ err = write_head(c, BASEHD, dent, len, &lnum, &offs, sync);
+ if (!sync && !err) {
+ struct ubifs_wbuf *wbuf = &c->jheads[BASEHD].wbuf;
+
+ ubifs_wbuf_add_ino_nolock(wbuf, new_dir->i_ino);
+ ubifs_wbuf_add_ino_nolock(wbuf, old_dir->i_ino);
+ }
+ release_head(c, BASEHD);
+ if (err)
+ goto out_ro;
+ if (new_inode)
+ ubifs_wbuf_add_ino_nolock(&c->jheads[BASEHD].wbuf,
+ new_inode->i_ino);
+
+ dent_key_init(c, &key, new_dir->i_ino, &new_dentry->d_name);
+ err = ubifs_tnc_add_nm(c, &key, lnum, offs, dlen1, &new_dentry->d_name);
+ if (err)
+ goto out_ro;
+
+ err = ubifs_add_dirt(c, lnum, dlen2);
+ if (err)
+ goto out_ro;
+
+ dent_key_init(c, &key, old_dir->i_ino, &old_dentry->d_name);
+ err = ubifs_tnc_remove_nm(c, &key, &old_dentry->d_name);
+ if (err)
+ goto out_ro;
+
+ offs += aligned_dlen1 + aligned_dlen2;
+ if (new_inode) {
+ ino_key_init(c, &key, new_inode->i_ino);
+ err = ubifs_tnc_add(c, &key, lnum, offs, ilen);
+ if (err)
+ goto out_ro;
+ offs += ALIGN(ilen, 8);
+ }
+
+ ino_key_init(c, &key, old_dir->i_ino);
+ err = ubifs_tnc_add(c, &key, lnum, offs, plen);
+ if (err)
+ goto out_ro;
+
+ if (old_dir != new_dir) {
+ offs += ALIGN(plen, 8);
+ ino_key_init(c, &key, new_dir->i_ino);
+ err = ubifs_tnc_add(c, &key, lnum, offs, plen);
+ if (err)
+ goto out_ro;
+ }
+
+ finish_reservation(c);
+ kfree(dent);
+ return 0;
+
+out_ro:
+ ubifs_ro_mode(c);
+ if (last_reference)
+ ubifs_delete_orphan(c, new_inode->i_ino);
+out_finish:
+ finish_reservation(c);
+out_free:
+ kfree(dent);
+ return err;
+}
+
+/**
+ * recomp_data_node - re-compress a truncated data node.
+ * @dn: data node to re-compress
+ * @new_len: new length
+ *
+ * This function is used when an inode is truncated and the last data node of
+ * the inode has to be re-compressed and re-written.
+ */
+static int recomp_data_node(struct ubifs_data_node *dn, int *new_len)
+{
+ void *buf;
+ int err, len, compr_type, out_len;
+
+ out_len = le32_to_cpu(dn->size);
+ buf = kmalloc(out_len * WORST_COMPR_FACTOR, GFP_NOFS);
+ if (!buf)
+ return -ENOMEM;
+
+ len = le32_to_cpu(dn->ch.len) - UBIFS_DATA_NODE_SZ;
+ compr_type = le16_to_cpu(dn->compr_type);
+ err = ubifs_decompress(&dn->data, len, buf, &out_len, compr_type);
+ if (err)
+ goto out;
+
+ ubifs_compress(buf, *new_len, &dn->data, &out_len, &compr_type);
+ ubifs_assert(out_len <= UBIFS_BLOCK_SIZE);
+ dn->compr_type = cpu_to_le16(compr_type);
+ dn->size = cpu_to_le32(*new_len);
+ *new_len = UBIFS_DATA_NODE_SZ + out_len;
+out:
+ kfree(buf);
+ return err;
+}
+
+/**
+ * ubifs_jrn_truncate - update the journal for a truncation.
+ * @c: UBIFS file-system description object
+ * @inum: inode number of inode being truncated
+ * @old_size: old size
+ * @new_size: new size
+ *
+ * When the size of a file decreases due to truncation, a truncation node is
+ * written, the journal tree is updated, and the last data block is re-written
+ * if it has been affected.
+ *
+ * This function returns %0 in the case of success, and a negative error code in
+ * case of failure.
+ */
+int ubifs_jrn_truncate(struct ubifs_info *c, ino_t inum,
+ loff_t old_size, loff_t new_size)
+{
+ union ubifs_key key, to_key;
+ struct ubifs_trun_node *trun;
+ struct ubifs_data_node *dn;
+ int err, dlen, len, lnum, offs, bit, sz;
+ unsigned int blk;
+
+ dbg_jrn("ino %lu, size %lld -> %lld", inum, old_size, new_size);
+
+ sz = UBIFS_TRUN_NODE_SZ + UBIFS_MAX_DATA_NODE_SZ * WORST_COMPR_FACTOR;
+ trun = kmalloc(sz, GFP_NOFS);
+ if (!trun)
+ return -ENOMEM;
+
+ trun->ch.node_type = UBIFS_TRUN_NODE;
+ trun_key_init_flash(c, &trun->key, inum);
+ trun->old_size = cpu_to_le64(old_size);
+ trun->new_size = cpu_to_le64(new_size);
+ ubifs_prepare_node(c, trun, UBIFS_TRUN_NODE_SZ, 0);
+
+ dlen = new_size & (UBIFS_BLOCK_SIZE - 1);
+
+ if (dlen) {
+ /* Get last data block so it can be truncated */
+ dn = (void *)trun + ALIGN(UBIFS_TRUN_NODE_SZ, 8);
+ blk = new_size / UBIFS_BLOCK_SIZE;
+ data_key_init(c, &key, inum, blk);
+ dbg_jrn_key(c, &key, "key");
+ err = ubifs_tnc_lookup(c, &key, dn);
+ if (err == -ENOENT)
+ dlen = 0; /* Not found (so it is a hole) */
+ else if (err)
+ goto out_free;
+ else {
+ if (le32_to_cpu(dn->size) <= dlen)
+ dlen = 0; /* Nothing to do */
+ else {
+ int compr_type = le16_to_cpu(dn->compr_type);
+
+ if (compr_type != UBIFS_COMPR_NONE) {
+ err = recomp_data_node(dn, &dlen);
+ if (err)
+ goto out_free;
+ } else {
+ dn->size = cpu_to_le32(dlen);
+ dlen += UBIFS_DATA_NODE_SZ;
+ }
+ ubifs_prepare_node(c, dn, dlen, 0);
+ }
+ }
+ }
+
+ if (dlen)
+ len = ALIGN(UBIFS_TRUN_NODE_SZ, 8) + dlen;
+ else
+ len = UBIFS_TRUN_NODE_SZ;
+
+ err = make_reservation(c, BASEHD, len);
+ if (err)
+ goto out_free;
+
+ err = write_head(c, BASEHD, trun, len, &lnum, &offs, 0);
+ if (!err)
+ ubifs_wbuf_add_ino_nolock(&c->jheads[BASEHD].wbuf, inum);
+ release_head(c, BASEHD);
+ if (err)
+ goto out_ro;
+
+ if (dlen) {
+ offs += ALIGN(UBIFS_TRUN_NODE_SZ, 8);
+ err = ubifs_tnc_add(c, &key, lnum, offs, dlen);
+ if (err)
+ goto out_ro;
+ }
+
+ err = ubifs_add_dirt(c, lnum, UBIFS_TRUN_NODE_SZ);
+ if (err)
+ goto out_ro;
+
+ bit = new_size & (UBIFS_BLOCK_SIZE - 1);
+
+ blk = new_size / UBIFS_BLOCK_SIZE + (bit ? 1 : 0);
+ data_key_init(c, &key, inum, blk);
+
+ bit = old_size & (UBIFS_BLOCK_SIZE - 1);
+
+ blk = old_size / UBIFS_BLOCK_SIZE - (bit ? 0: 1);
+ data_key_init(c, &to_key, inum, blk);
+
+ err = ubifs_tnc_remove_range(c, &key, &to_key);
+ if (err)
+ goto out_ro;
+
+ finish_reservation(c);
+ kfree(trun);
+ return 0;
+
+out_ro:
+ ubifs_ro_mode(c);
+ finish_reservation(c);
+out_free:
+ kfree(trun);
+ return err;
+}
+
+#ifdef CONFIG_UBIFS_FS_XATTR
+
+int ubifs_jrn_delete_xattr(struct ubifs_info *c, const struct inode *host,
+ const struct inode *inode, const struct qstr *nm,
+ int sync)
+{
+ int err, xlen, hlen, len, lnum, xent_offs, aligned_xlen;
+ struct ubifs_dent_node *xent;
+ struct ubifs_ino_node *ino;
+ union ubifs_key xent_key, key1, key2;
+
+ dbg_jrn("host %lu, xattr ino %lu, name '%s', data len %d",
+ host->i_ino, inode->i_ino, nm->name,
+ ubifs_inode(inode)->data_len);
+ ubifs_assert(inode->i_nlink == 0);
+
+ /*
+ * Since we are deleting the inode, we do not bother to attach any data
+ * to it and assume its length is %UBIFS_INO_NODE_SZ.
+ */
+ xlen = UBIFS_DENT_NODE_SZ + nm->len + 1;
+ aligned_xlen = ALIGN(xlen, 8);
+ hlen = ubifs_inode(host)->data_len + UBIFS_INO_NODE_SZ;
+ len = aligned_xlen + UBIFS_INO_NODE_SZ + ALIGN(hlen, 8);
+
+ xent = kmalloc(len, GFP_KERNEL);
+ if (!xent)
+ return -ENOMEM;
+
+ xent->ch.node_type = UBIFS_XENT_NODE;
+ xent_key_init(c, &xent_key, host->i_ino, nm);
+ key_write(c, &xent_key, xent->key);
+ xent->inum = 0;
+ xent->padding = 0;
+ xent->type = get_dent_type(inode->i_mode);
+ xent->nlen = cpu_to_le16(nm->len);
+ memcpy(xent->name, nm->name, nm->len);
+ xent->name[nm->len] = '\0';
+ ubifs_prep_grp_node(c, xent, xlen, 0);
+
+ ino = (void *)xent + aligned_xlen;
+ pack_inode(c, ino, inode, 0, 1);
+
+ ino = (void *)ino + UBIFS_INO_NODE_SZ;
+ pack_inode(c, ino, host, 1, 0);
+
+ err = make_reservation(c, BASEHD, len);
+ if (err) {
+ kfree(xent);
+ return err;
+ }
+
+ err = write_head(c, BASEHD, xent, len, &lnum, &xent_offs, sync);
+ if (!sync && !err)
+ ubifs_wbuf_add_ino_nolock(&c->jheads[BASEHD].wbuf, host->i_ino);
+ release_head(c, BASEHD);
+ kfree(xent);
+ if (err)
+ goto out_ro;
+
+ /* Remove the extended attribute entry from TNC */
+ err = ubifs_tnc_remove_nm(c, &xent_key, nm);
+ if (err)
+ goto out_ro;
+ err = ubifs_add_dirt(c, lnum, xlen);
+ if (err)
+ goto out_ro;
+
+ /*
+ * Remove all nodes belonging to the extended attribute inode from TNC.
+ * Well, there actually must be only one node - the inode itself.
+ */
+ lowest_ino_key(c, &key1, inode->i_ino);
+ highest_ino_key(c, &key2, inode->i_ino);
+ err = ubifs_tnc_remove_range(c, &key1, &key2);
+ if (err)
+ goto out_ro;
+ err = ubifs_add_dirt(c, lnum, UBIFS_INO_NODE_SZ);
+ if (err)
+ goto out_ro;
+
+ /* And update TNC with the new host inode position */
+ ino_key_init(c, &key1, host->i_ino);
+ err = ubifs_tnc_add(c, &key1, lnum, xent_offs + len - hlen, hlen);
+ if (err)
+ goto out_ro;
+
+ finish_reservation(c);
+ return 0;
+
+out_ro:
+ ubifs_ro_mode(c);
+ finish_reservation(c);
+ return err;
+}
+
+/**
+ * ubifs_jrn_write_2_inodes - write 2 inodes to the journal.
+ * @c: UBIFS file-system description object
+ * @inode1: first inode to write
+ * @inode2: second inode to write
+ * @sync: non-zero if the write-buffer has to be synchronized
+ *
+ * This function writes 2 inodes @inode1 and @inode2 to the journal (to the
+ * base head - first @inode1, then @inode2). Returns zero in case of success
+ * and a negative error code in case of failure.
+ */
+int ubifs_jrn_write_2_inodes(struct ubifs_info *c, const struct inode *inode1,
+ const struct inode *inode2, int sync)
+{
+ int err, len1, len2, aligned_len, aligned_len1, lnum, offs;
+ struct ubifs_ino_node *ino;
+ union ubifs_key key;
+
+ dbg_jrn("ino %lu, ino %lu", inode1->i_ino, inode2->i_ino);
+ ubifs_assert(inode1->i_nlink > 0);
+ ubifs_assert(inode2->i_nlink > 0);
+
+ len1 = UBIFS_INO_NODE_SZ + ubifs_inode(inode1)->data_len;
+ len2 = UBIFS_INO_NODE_SZ + ubifs_inode(inode2)->data_len;
+ aligned_len1 = ALIGN(len1, 8);
+ aligned_len = aligned_len1 + ALIGN(len2, 8);
+
+ ino = kmalloc(aligned_len, GFP_NOFS);
+ if (!ino)
+ return -ENOMEM;
+ pack_inode(c, ino, inode1, 0, 0);
+ pack_inode(c, (void *)ino + aligned_len1, inode2, 1, 0);
+
+ err = make_reservation(c, BASEHD, aligned_len);
+ if (err)
+ goto out_free;
+
+ err = write_head(c, BASEHD, ino, aligned_len, &lnum, &offs, 0);
+ if (!sync && !err) {
+ struct ubifs_wbuf *wbuf = &c->jheads[BASEHD].wbuf;
+
+ ubifs_wbuf_add_ino_nolock(wbuf, inode1->i_ino);
+ ubifs_wbuf_add_ino_nolock(wbuf, inode2->i_ino);
+ }
+ release_head(c, BASEHD);
+ if (err)
+ goto out_ro;
+
+ ino_key_init(c, &key, inode1->i_ino);
+ err = ubifs_tnc_add(c, &key, lnum, offs, len1);
+ if (err)
+ goto out_ro;
+
+ ino_key_init(c, &key, inode2->i_ino);
+ err = ubifs_tnc_add(c, &key, lnum, offs + aligned_len1, len2);
+ if (err)
+ goto out_ro;
+
+ finish_reservation(c);
+ kfree(ino);
+ return 0;
+
+out_ro:
+ ubifs_ro_mode(c);
+ finish_reservation(c);
+out_free:
+ kfree(ino);
+ return err;
+}
+
+#endif /* CONFIG_UBIFS_FS_XATTR */
diff --git a/fs/ubifs/log.c b/fs/ubifs/log.c
new file mode 100644
index 0000000..f55e7c1
--- /dev/null
+++ b/fs/ubifs/log.c
@@ -0,0 +1,769 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file is a part of UBIFS journal implementation and contains various
+ * functions which manipulate the log. The log is a fixed area on the flash
+ * which does not contain any data but refers to buds. The log is a part of the
+ * journal.
+ */
+
+#include "ubifs.h"
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_OTHER
+static int dbg_check_bud_bytes(struct ubifs_info *c);
+#else
+#define dbg_check_bud_bytes(c) 0
+#endif
+
+/**
+ * ubifs_search_bud - search bud LEB.
+ * @c: UBIFS file-system description object
+ * @lnum: logical eraseblock number to search
+ *
+ * This function searches bud LEB @lnum. Returns bud description object in case
+ * of success and %NULL if there is no bud with this LEB number.
+ */
+struct ubifs_bud *ubifs_search_bud(struct ubifs_info *c, int lnum)
+{
+ struct rb_node *p;
+ struct ubifs_bud *bud;
+
+ spin_lock(&c->buds_lock);
+ p = c->buds.rb_node;
+ while (p) {
+ bud = rb_entry(p, struct ubifs_bud, rb);
+ if (lnum < bud->lnum)
+ p = p->rb_left;
+ else if (lnum > bud->lnum)
+ p = p->rb_right;
+ else {
+ spin_unlock(&c->buds_lock);
+ return bud;
+ }
+ }
+ spin_unlock(&c->buds_lock);
+ return NULL;
+}
+
+/**
+ * next_log_lnum - switch to the next log LEB.
+ * @c: UBIFS file-system description object
+ * @lnum: current log LEB
+ */
+static inline int next_log_lnum(const struct ubifs_info *c, int lnum)
+{
+ lnum += 1;
+ if (lnum > c->log_last)
+ lnum = UBIFS_LOG_LNUM;
+
+ return lnum;
+}
+
+/**
+ * empty_log_bytes - calculate amount of empty space in the log.
+ * @c: UBIFS file-system description object
+ */
+static inline long long empty_log_bytes(const struct ubifs_info *c)
+{
+ long long h, t;
+
+ h = c->lhead_lnum * c->leb_size + c->lhead_offs;
+ t = c->ltail_lnum * c->leb_size;
+
+ if (h >= t)
+ return c->log_bytes - h + t;
+ else
+ return t - h;
+}
+
+/**
+ * ubifs_add_bud - add bud LEB to the tree of buds and its journal head list.
+ * @c: UBIFS file-system description object
+ * @bud: the bud to add
+ */
+void ubifs_add_bud(struct ubifs_info *c, struct ubifs_bud *bud)
+{
+ struct rb_node **p, *parent = NULL;
+ struct ubifs_bud *b;
+ struct ubifs_jhead *jhead;
+
+ spin_lock(&c->buds_lock);
+ p = &c->buds.rb_node;
+ while (*p) {
+ parent = *p;
+ b = rb_entry(parent, struct ubifs_bud, rb);
+ ubifs_assert(bud->lnum != b->lnum);
+ if (bud->lnum < b->lnum)
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
+ }
+
+ rb_link_node(&bud->rb, parent, p);
+ rb_insert_color(&bud->rb, &c->buds);
+ if (c->jheads) {
+ jhead = &c->jheads[bud->jhead];
+ list_add_tail(&bud->list, &jhead->buds_list);
+ } else
+ ubifs_assert(c->replaying && (c->vfs_sb->s_flags & MS_RDONLY));
+
+ /*
+ * Note, although this is a new bud, we anyway account this space now,
+ * before any data has been written to it, because this is about to
+ * guarantee fixed mount time, and this bud will anyway be read and
+ * scanned.
+ */
+ c->bud_bytes += c->leb_size - bud->start;
+
+ dbg_log("LEB %d:%d, jhead %d, bud_bytes %lld", bud->lnum,
+ bud->start, bud->jhead, c->bud_bytes);
+ spin_unlock(&c->buds_lock);
+}
+
+/**
+ * ubifs_create_buds_lists - create journal head buds lists for remount rw.
+ * @c: UBIFS file-system description object
+ */
+void ubifs_create_buds_lists(struct ubifs_info *c)
+{
+ struct rb_node *p;
+
+ spin_lock(&c->buds_lock);
+ p = rb_first(&c->buds);
+ while (p) {
+ struct ubifs_bud *bud = rb_entry(p, struct ubifs_bud, rb);
+ struct ubifs_jhead *jhead = &c->jheads[bud->jhead];
+
+ list_add_tail(&bud->list, &jhead->buds_list);
+ p = rb_next(p);
+ }
+ spin_unlock(&c->buds_lock);
+}
+
+/**
+ * ubifs_add_bud_to_log - add a new bud to the log.
+ * @c: UBIFS file-system description object
+ * @jhead: journal head the bud belongs to
+ * @lnum: LEB number of the bud
+ * @offs: starting offset of the bud
+ *
+ * This function writes reference node for the new bud LEB @lnum it to the log,
+ * and adds it to the buds tress. It also makes sure that log size does not
+ * exceed the 'c->max_bud_bytes' limit. Returns zero in case of success,
+ * %-EAGAIN if commit is required, and a negative error codes in case of
+ * failure.
+ */
+int ubifs_add_bud_to_log(struct ubifs_info *c, int jhead, int lnum, int offs)
+{
+ int err;
+ struct ubifs_bud *bud;
+ struct ubifs_ref_node *ref;
+
+ ubifs_assert(lnum > 0);
+ ubifs_assert(offs >= 0 && offs < c->leb_size);
+ ubifs_assert(jhead >= 0 && jhead < c->jhead_cnt);
+
+ bud = kmalloc(sizeof(struct ubifs_bud), GFP_NOFS);
+ if (!bud)
+ return -ENOMEM;
+ ref = kmalloc(c->ref_node_alsz, GFP_NOFS);
+ if (!ref) {
+ kfree(bud);
+ return -ENOMEM;
+ }
+
+ mutex_lock(&c->log_mutex);
+ /* Make sure we have enough space in the log */
+ if (empty_log_bytes(c) - c->ref_node_alsz < c->min_log_bytes) {
+ dbg_log("not enough log space - %lld, required %d",
+ empty_log_bytes(c), c->min_log_bytes);
+ ubifs_commit_required(c);
+ err = -EAGAIN;
+ goto out_unlock;
+ }
+
+ /*
+ * Make sure the the amount of space in buds will not exceed
+ * 'c->max_bud_bytes' limit, because we want to guarantee mount time
+ * limits.
+ */
+ spin_lock(&c->buds_lock);
+ if (c->bud_bytes + c->leb_size - offs > c->max_bud_bytes) {
+ dbg_log("bud bytes %lld (%lld max), require commit",
+ c->bud_bytes, c->max_bud_bytes);
+ spin_unlock(&c->buds_lock);
+ ubifs_commit_required(c);
+ err = -EAGAIN;
+ goto out_unlock;
+ }
+ spin_unlock(&c->buds_lock);
+
+ /*
+ * If the journal is full enough - start background commit. Note, it is
+ * OK to read 'c->cmt_state' without spinlock because integer reads
+ * are atomic in the kernel.
+ */
+ if (c->bud_bytes >= c->bg_bud_bytes &&
+ c->cmt_state == COMMIT_RESTING) {
+ dbg_log("bud bytes %lld (%lld max), initiate BG commit",
+ c->bud_bytes, c->max_bud_bytes);
+ ubifs_request_bg_commit(c);
+ }
+
+ bud->lnum = lnum;
+ bud->start = offs;
+ bud->jhead = jhead;
+
+ ref->ch.node_type = UBIFS_REF_NODE;
+ ref->lnum = cpu_to_le32(bud->lnum);
+ ref->offs = cpu_to_le32(bud->start);
+ ref->jhead = cpu_to_le32(jhead);
+
+ if (c->lhead_offs > c->leb_size - c->ref_node_alsz) {
+ c->lhead_lnum = next_log_lnum(c, c->lhead_lnum);
+ c->lhead_offs = 0;
+ }
+
+ if (c->lhead_offs == 0) {
+ /* Must ensure next log LEB has been unmapped */
+ err = ubifs_leb_unmap(c, c->lhead_lnum);
+ if (err)
+ goto out_unlock;
+ }
+
+ if (bud->start == 0) {
+ /*
+ * Before writing the LEB reference which refers an empty LEB
+ * to the log, we have to make sure it is mapped, because
+ * otherwise we'd risk to refer an LEB with garbage in case of
+ * an unclean reboot, because the target LEB might have been
+ * unmapped, but not yet physically erased.
+ */
+ err = ubi_leb_map(c->ubi, bud->lnum, UBI_SHORTTERM);
+ if (err)
+ goto out_unlock;
+ }
+
+ dbg_log("write ref LEB %d:%d",
+ c->lhead_lnum, c->lhead_offs);
+ err = ubifs_write_node(c, ref, UBIFS_REF_NODE_SZ, c->lhead_lnum,
+ c->lhead_offs, UBI_SHORTTERM);
+ c->lhead_offs += c->ref_node_alsz;
+ if (err)
+ goto out_unlock;
+ mutex_unlock(&c->log_mutex);
+
+ kfree(ref);
+ ubifs_add_bud(c, bud);
+
+ return 0;
+
+out_unlock:
+ mutex_unlock(&c->log_mutex);
+ kfree(ref);
+ kfree(bud);
+ return err;
+}
+
+/**
+ * remove_buds - remove used buds.
+ * @c: UBIFS file-system description object
+ *
+ * This function removes use buds from the buds tree. It does not remove the
+ * buds which are pointed to by journal heads. Returns zero in case of success
+ * and a negative error code in case of failure.
+ */
+static int remove_buds(struct ubifs_info *c)
+{
+ struct rb_node *p;
+ struct ubifs_bud *bud;
+ int err = 0;
+
+ ubifs_assert(list_empty(&c->old_buds));
+ c->cmt_bud_bytes = 0;
+ spin_lock(&c->buds_lock);
+ p = rb_first(&c->buds);
+ while (p) {
+ struct rb_node *p1 = p;
+
+ p = rb_next(p);
+ bud = rb_entry(p1, struct ubifs_bud, rb);
+
+ /*
+ * Do not remove buds which are pointed to by journal heads
+ * (non-closed buds).
+ */
+ if (c->jheads[bud->jhead].wbuf.lnum == bud->lnum) {
+ dbg_log("preserve LEB %d:%d (jhead %d)",
+ bud->lnum, bud->start, bud->jhead);
+ continue;
+ }
+
+ rb_erase(p1, &c->buds);
+ list_del(&bud->list);
+
+ /*
+ * If the commit does not finish, the recovery will need to
+ * replay the journal, in which case the old buds must be
+ * intact. Do not release them until post commit.
+ */
+ list_add(&bud->list, &c->old_buds);
+
+ /*
+ * We've removed this bud, save its size in 'c->cmt_bud_bytes'
+ * - this value will be subtracted from 'c->bud_bytes' when
+ * commit is done.
+ */
+ c->cmt_bud_bytes += c->leb_size - bud->start;
+ dbg_log("LEB %d:%d, jhead %d, cmt_bud_bytes %lld",
+ bud->lnum, bud->start, bud->jhead, c->cmt_bud_bytes);
+ }
+ spin_unlock(&c->buds_lock);
+
+ return err;
+}
+
+/**
+ * ubifs_log_start_commit - start commit.
+ * @c: UBIFS file-system description object
+ * @ltail_lnum: return new log tail LEB number
+ *
+ * The commit operation starts with writing "commit start" node to the log and
+ * reference nodes for all journal heads which will define new journal after
+ * the commit has been finished. The commit start and reference nodes are
+ * written in one go to the nearest empty log LEB (hence, when commit is
+ * finished UBIFS may safely unmap all the previous log LEBs). This function
+ * returns zero in case of success and a negative error code in case of
+ * failure.
+ */
+int ubifs_log_start_commit(struct ubifs_info *c, int *ltail_lnum)
+{
+ void *buf;
+ struct ubifs_cs_node *cs;
+ struct ubifs_ref_node *ref;
+ int err, i, max_len, len;
+
+ err = dbg_check_bud_bytes(c);
+ if (err)
+ return err;
+
+ max_len = UBIFS_CS_NODE_SZ + c->jhead_cnt * UBIFS_REF_NODE_SZ;
+ max_len = ALIGN(max_len, c->min_io_size);
+ buf = cs = kmalloc(max_len, GFP_NOFS);
+ if (!buf)
+ return -ENOMEM;
+
+ cs->ch.node_type = UBIFS_CS_NODE;
+ cs->cmt_no = cpu_to_le64(c->cmt_no + 1);
+ ubifs_prepare_node(c, cs, UBIFS_CS_NODE_SZ, 0);
+
+ /*
+ * Note, we do not lock 'c->log_mutex' because this is the commit start
+ * phase and we are exclusively using the log. And we do not lock
+ * write-buffer because nobody can write to the file-system at this
+ * phase.
+ */
+
+ len = UBIFS_CS_NODE_SZ;
+ for (i = 0; i < c->jhead_cnt; i++) {
+ int lnum = c->jheads[i].wbuf.lnum;
+ int offs = c->jheads[i].wbuf.offs;
+
+ ubifs_assert(offs <= c->leb_size);
+ if (lnum == -1 || offs == c->leb_size)
+ continue;
+
+ dbg_log("add ref to LEB %d:%d for jhead %d", lnum, offs, i);
+ ref = buf + len;
+ ref->ch.node_type = UBIFS_REF_NODE;
+ ref->lnum = cpu_to_le32(lnum);
+ ref->offs = cpu_to_le32(offs);
+ ref->jhead = cpu_to_le32(i);
+
+ ubifs_prepare_node(c, ref, UBIFS_REF_NODE_SZ, 0);
+ len += UBIFS_REF_NODE_SZ;
+ }
+
+ ubifs_assert(len <= c->leb_size);
+ ubifs_pad(c, buf + len, ALIGN(len, c->min_io_size) - len);
+
+ /* Switch to the next log LEB */
+ if (c->lhead_offs) {
+ c->lhead_lnum = next_log_lnum(c, c->lhead_lnum);
+ c->lhead_offs = 0;
+ }
+
+ if (c->lhead_offs == 0) {
+ /* Must ensure next LEB has been unmapped */
+ err = ubifs_leb_unmap(c, c->lhead_lnum);
+ if (err)
+ goto out;
+ }
+
+ len = ALIGN(len, c->min_io_size);
+ dbg_log("writing commit start at LEB %d:0, len %d", c->lhead_lnum, len);
+ err = ubifs_leb_write(c, c->lhead_lnum, cs, 0, len, UBI_SHORTTERM);
+ if (err)
+ goto out;
+
+ *ltail_lnum = c->lhead_lnum;
+
+ c->lhead_offs += len;
+ if (c->lhead_offs == c->leb_size) {
+ c->lhead_lnum = next_log_lnum(c, c->lhead_lnum);
+ c->lhead_offs = 0;
+ }
+
+ err = remove_buds(c);
+
+ /*
+ * We have started the commit and now users may use the rest of the log
+ * for new writes.
+ */
+ c->min_log_bytes = 0;
+
+out:
+ kfree(buf);
+ return err;
+}
+
+/**
+ * ubifs_log_end_commit - end commit.
+ * @c: UBIFS file-system description object
+ * @ltail_lnum: new log tail LEB number
+ *
+ * This function is called on when the commit operation was finished. It
+ * moves log tail to new position and unmaps LEBs which contain obsolete data.
+ * Returns zero in case of success and a negative error code in case of
+ * failure.
+ */
+int ubifs_log_end_commit(struct ubifs_info *c, int ltail_lnum)
+{
+ int err;
+
+ /*
+ * At this phase we have to lock 'c->log_mutex' because UBIFS allows FS
+ * writes during commit. Its only short "commit" start phase when
+ * writers are blocked.
+ */
+ mutex_lock(&c->log_mutex);
+
+ dbg_log("old tail was LEB %d:0, new tail is LEB %d:0",
+ c->ltail_lnum, ltail_lnum);
+
+ c->ltail_lnum = ltail_lnum;
+ /*
+ * The commit is finished and from now on it must be guaranteed that
+ * there is always enough space for the next commit.
+ */
+ c->min_log_bytes = c->leb_size;
+
+ spin_lock(&c->buds_lock);
+ c->bud_bytes -= c->cmt_bud_bytes;
+ spin_unlock(&c->buds_lock);
+
+ err = dbg_check_bud_bytes(c);
+
+ mutex_unlock(&c->log_mutex);
+ return err;
+}
+
+/**
+ * ubifs_log_post_commit - things to do after commit is completed.
+ * @c: UBIFS file-system description object
+ * @old_ltail_lnum: old log tail LEB number
+ *
+ * Release buds only after commit is completed, because they must be unchanged
+ * if recovery is needed.
+ *
+ * Unmap log LEBs only after commit is completed, because they may be needed for
+ * recovery.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_log_post_commit(struct ubifs_info *c, int old_ltail_lnum)
+{
+ int lnum, err = 0;
+
+ while (!list_empty(&c->old_buds)) {
+ struct ubifs_bud *bud;
+
+ bud = list_entry(c->old_buds.next, struct ubifs_bud, list);
+ err = ubifs_return_leb(c, bud->lnum);
+ if (err)
+ return err;
+ list_del(&bud->list);
+ kfree(bud);
+ }
+ mutex_lock(&c->log_mutex);
+ for (lnum = old_ltail_lnum; lnum != c->ltail_lnum;
+ lnum = next_log_lnum(c, lnum)) {
+ dbg_log("unmap log LEB %d", lnum);
+ err = ubifs_leb_unmap(c, lnum);
+ if (err)
+ goto out;
+ }
+out:
+ mutex_unlock(&c->log_mutex);
+ return err;
+}
+
+/**
+ * struct done_ref - references that have been done.
+ * @rb: rb-tree node
+ * @lnum: LEB number
+ */
+struct done_ref {
+ struct rb_node rb;
+ int lnum;
+};
+
+/**
+ * done_already - determine if a reference has been done already.
+ * @done_tree: rb-tree to store references that have been done
+ * @lnum: LEB number of reference
+ *
+ * This function returns %1 if the reference has been done, %0 if not, otherwise
+ * a negative error code is returned.
+ */
+static int done_already(struct rb_root *done_tree, int lnum)
+{
+ struct rb_node **p = &done_tree->rb_node, *parent = NULL;
+ struct done_ref *dr;
+
+ while (*p) {
+ parent = *p;
+ dr = rb_entry(parent, struct done_ref, rb);
+ if (lnum < dr->lnum)
+ p = &(*p)->rb_left;
+ else if (lnum > dr->lnum)
+ p = &(*p)->rb_right;
+ else
+ return 1;
+ }
+
+ dr = kzalloc(sizeof(struct done_ref), GFP_NOFS);
+ if (!dr)
+ return -ENOMEM;
+
+ dr->lnum = lnum;
+
+ rb_link_node(&dr->rb, parent, p);
+ rb_insert_color(&dr->rb, done_tree);
+
+ return 0;
+}
+
+/**
+ * destroy_done_tree - destroy the done tree.
+ * @done_tree: done tree to destroy
+ */
+static void destroy_done_tree(struct rb_root *done_tree)
+{
+ struct rb_node *this = done_tree->rb_node;
+ struct done_ref *dr;
+
+ while (this) {
+ if (this->rb_left) {
+ this = this->rb_left;
+ continue;
+ } else if (this->rb_right) {
+ this = this->rb_right;
+ continue;
+ }
+ dr = rb_entry(this, struct done_ref, rb);
+ this = rb_parent(this);
+ if (this) {
+ if (this->rb_left == &dr->rb)
+ this->rb_left = NULL;
+ else
+ this->rb_right = NULL;
+ }
+ kfree(dr);
+ }
+}
+
+/**
+ * add_node - add a node to the consolidated log.
+ * @c: UBIFS file-system description object
+ * @buf: buffer to which to add
+ * @lnum: LEB number to which to write is passed and returned here
+ * @offs: offset to where to write is passed and returned here
+ * @node: node to add
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int add_node(struct ubifs_info *c, void *buf, int *lnum, int *offs,
+ void *node)
+{
+ struct ubifs_ch *ch = node;
+ int len = le32_to_cpu(ch->len), remains = c->leb_size - *offs;
+
+ if (len > remains) {
+ int sz = ALIGN(*offs, c->min_io_size), err;
+
+ ubifs_pad(c, buf + *offs, sz - *offs);
+ err = ubi_leb_change(c->ubi, *lnum, buf, sz, UBI_SHORTTERM);
+ if (err)
+ return err;
+ *lnum = next_log_lnum(c, *lnum);
+ *offs = 0;
+ }
+ memcpy(buf + *offs, node, len);
+ *offs += ALIGN(len, 8);
+ return 0;
+}
+
+/**
+ * ubifs_consolidate_log - consolidate the log.
+ * @c: UBIFS file-system description object
+ *
+ * Repeated failed commits could cause the log to be full, but at least 1 LEB is
+ * needed for commit. This function rewrites the reference nodes in the log
+ * omitting duplicates, and failed CS nodes, and leaving no gaps.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_consolidate_log(struct ubifs_info *c)
+{
+ struct ubifs_scan_leb *sleb;
+ struct ubifs_scan_node *snod;
+ struct rb_root done_tree = RB_ROOT;
+ int lnum, err, first = 1, write_lnum, offs = 0;
+ void *buf;
+
+ dbg_mnt("log tail LEB %d, log head LEB %d", c->ltail_lnum,
+ c->lhead_lnum);
+ buf = vmalloc(c->leb_size);
+ if (!buf)
+ return -ENOMEM;
+ lnum = c->ltail_lnum;
+ write_lnum = lnum;
+ while (1) {
+ sleb = ubifs_scan(c, lnum, 0, c->sbuf);
+ if (IS_ERR(sleb)) {
+ err = PTR_ERR(sleb);
+ goto out_free;
+ }
+ list_for_each_entry(snod, &sleb->nodes, list) {
+ switch (snod->type) {
+ case UBIFS_REF_NODE: {
+ struct ubifs_ref_node *ref = snod->node;
+ int ref_lnum = le32_to_cpu(ref->lnum);
+
+ err = done_already(&done_tree, ref_lnum);
+ if (err < 0)
+ goto out_scan;
+ if (err != 1) {
+ err = add_node(c, buf, &write_lnum,
+ &offs, snod->node);
+ if (err)
+ goto out_scan;
+ }
+ break;
+ }
+ case UBIFS_CS_NODE:
+ if (!first)
+ break;
+ err = add_node(c, buf, &write_lnum, &offs,
+ snod->node);
+ if (err)
+ goto out_scan;
+ first = 0;
+ break;
+ }
+ }
+ ubifs_scan_destroy(sleb);
+ if (lnum == c->lhead_lnum)
+ break;
+ lnum = next_log_lnum(c, lnum);
+ }
+ if (offs) {
+ int sz = ALIGN(offs, c->min_io_size);
+
+ ubifs_pad(c, buf + offs, sz - offs);
+ err = ubi_leb_change(c->ubi, write_lnum, buf, sz,
+ UBI_SHORTTERM);
+ if (err)
+ goto out_free;
+ offs = ALIGN(offs, c->min_io_size);
+ }
+ destroy_done_tree(&done_tree);
+ vfree(buf);
+ if (write_lnum == c->lhead_lnum) {
+ ubifs_err("log is too full");
+ return -EINVAL;
+ }
+ /* Unmap remaining LEBs */
+ lnum = write_lnum;
+ do {
+ lnum = next_log_lnum(c, lnum);
+ err = ubifs_leb_unmap(c, lnum);
+ if (err)
+ return err;
+ } while (lnum != c->lhead_lnum);
+ c->lhead_lnum = write_lnum;
+ c->lhead_offs = offs;
+ dbg_mnt("new log head at %d:%d", c->lhead_lnum, c->lhead_offs);
+ return 0;
+
+out_scan:
+ ubifs_scan_destroy(sleb);
+out_free:
+ destroy_done_tree(&done_tree);
+ vfree(buf);
+ return err;
+}
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_OTHER
+
+/**
+ * dbg_check_bud_bytes - make sure bud bytes calculation are all right.
+ * @c: UBIFS file-system description object
+ *
+ * This function makes sure the amount of flash space used by closed buds
+ * ('c->bud_bytes' is correct). Returns zero in case of success and %-EINVAL in
+ * case of failure.
+ */
+static int dbg_check_bud_bytes(struct ubifs_info *c)
+{
+ int i, err = 0;
+ struct ubifs_bud *bud;
+ long long bud_bytes = 0;
+
+ spin_lock(&c->buds_lock);
+ for (i = 0; i < c->jhead_cnt; i++)
+ list_for_each_entry(bud, &c->jheads[i].buds_list, list)
+ bud_bytes += c->leb_size - bud->start;
+
+ if (c->bud_bytes != bud_bytes) {
+ ubifs_err("bad bud_bytes %lld, calculated %lld",
+ c->bud_bytes, bud_bytes);
+ err = -EINVAL;
+ }
+ spin_unlock(&c->buds_lock);
+
+ return err;
+}
+
+#endif /* CONFIG_UBIFS_FS_DEBUG_CHK_OTHER */
--
1.5.4.1

2008-03-27 13:09:37

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 12/26] UBIFS: add TNC implementation

TNC - tree node cache - the central UBIFS entity. It is basically
in-RAM cache of the on-flash indexing B-tree. But TNC also indexes
the journal, so that they are not always equivalent.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/tnc.c | 3483 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 3483 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/tnc.c b/fs/ubifs/tnc.c
new file mode 100644
index 0000000..27e2b60
--- /dev/null
+++ b/fs/ubifs/tnc.c
@@ -0,0 +1,3483 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements TNC (Tree Node Cache) which caches indexing nodes of
+ * the UBIFS B-tree.
+ *
+ * At the moment the locking rules of the TNC tree are quite simple and
+ * straightforward. We just have a mutex and lock it when we traverse the
+ * tree. If a znode is not in memory, we read it from flash while still having
+ * the mutex locked.
+ */
+
+#include <linux/crc32.h>
+#include "ubifs.h"
+
+/**
+ * insert_old_idx - record an index node obsoleted since the last commit start.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number of obsoleted index node
+ * @offs: offset of obsoleted index node
+ *
+ * Returns %0 on success, and a negative error code on failure.
+ */
+static int insert_old_idx(struct ubifs_info *c, int lnum, int offs)
+{
+ struct ubifs_old_idx *old_idx, *o;
+ struct rb_node **p, *parent = NULL;
+
+ ubifs_assert(lnum >= c->main_first && lnum < c->leb_cnt);
+ ubifs_assert(offs >= 0 && offs < c->leb_size);
+
+ old_idx = kmalloc(sizeof(struct ubifs_old_idx), GFP_NOFS);
+ if (!old_idx)
+ return -ENOMEM;
+ old_idx->lnum = lnum;
+ old_idx->offs = offs;
+
+ p = &c->old_idx.rb_node;
+ while (*p) {
+ parent = *p;
+ o = rb_entry(parent, struct ubifs_old_idx, rb);
+ if (lnum < o->lnum)
+ p = &(*p)->rb_left;
+ else if (lnum > o->lnum)
+ p = &(*p)->rb_right;
+ else if (offs < o->offs)
+ p = &(*p)->rb_left;
+ else if (offs > o->offs)
+ p = &(*p)->rb_right;
+ else {
+ ubifs_err("old idx added twice!");
+ kfree(old_idx);
+ return 0;
+ }
+ }
+ rb_link_node(&old_idx->rb, parent, p);
+ rb_insert_color(&old_idx->rb, &c->old_idx);
+ return 0;
+}
+
+/**
+ * insert_old_idx_znode - record a znode obsoleted since last commit start.
+ * @c: UBIFS file-system description object
+ * @znode: znode of obsoleted index node
+ *
+ * Returns %0 on success, and a negative error code on failure.
+ */
+int insert_old_idx_znode(struct ubifs_info *c, struct ubifs_znode *znode)
+{
+ if (znode->parent) {
+ struct ubifs_zbranch *zbr;
+
+ zbr = &znode->parent->zbranch[znode->iip];
+ if (zbr->len)
+ return insert_old_idx(c, zbr->lnum, zbr->offs);
+ } else
+ if (c->zroot.len)
+ return insert_old_idx(c, c->zroot.lnum,
+ c->zroot.offs);
+ return 0;
+}
+
+/**
+ * ins_clr_old_idx_znode - record a znode obsoleted since last commit start.
+ * @c: UBIFS file-system description object
+ * @znode: znode of obsoleted index node
+ *
+ * Returns %0 on success, and a negative error code on failure.
+ */
+static int ins_clr_old_idx_znode(struct ubifs_info *c,
+ struct ubifs_znode *znode)
+{
+ int err;
+
+ if (znode->parent) {
+ struct ubifs_zbranch *zbr;
+
+ zbr = &znode->parent->zbranch[znode->iip];
+ if (zbr->len) {
+ err = insert_old_idx(c, zbr->lnum, zbr->offs);
+ if (err)
+ return err;
+ zbr->lnum = 0;
+ zbr->offs = 0;
+ zbr->len = 0;
+ }
+ } else
+ if (c->zroot.len) {
+ err = insert_old_idx(c, c->zroot.lnum, c->zroot.offs);
+ if (err)
+ return err;
+ c->zroot.lnum = 0;
+ c->zroot.offs = 0;
+ c->zroot.len = 0;
+ }
+ return 0;
+}
+
+/**
+ * destroy_old_idx - destroy the old_idx RB-tree.
+ * @c: UBIFS file-system description object
+ *
+ * During start commit, the old_idx RB-tree is used to avoid overwriting index
+ * nodes that were in the index last commit but have since been deleted. This
+ * is necessary for recovery i.e. the old index must be kept intact until the
+ * new index is successfully written. The old-idx RB-tree is used for the
+ * in-the-gaps method of writing index nodes and is destroyed every commit.
+ */
+void destroy_old_idx(struct ubifs_info *c)
+{
+ struct rb_node *this = c->old_idx.rb_node;
+ struct ubifs_old_idx *old_idx;
+
+ while (this) {
+ if (this->rb_left) {
+ this = this->rb_left;
+ continue;
+ } else if (this->rb_right) {
+ this = this->rb_right;
+ continue;
+ }
+ old_idx = rb_entry(this, struct ubifs_old_idx, rb);
+ this = rb_parent(this);
+ if (this) {
+ if (this->rb_left == &old_idx->rb)
+ this->rb_left = NULL;
+ else
+ this->rb_right = NULL;
+ }
+ kfree(old_idx);
+ }
+ c->old_idx = RB_ROOT;
+}
+
+/**
+ * search_zbranch - search znode branch.
+ * @c: UBIFS file-system description object
+ * @znode: znode to search in
+ * @key: key to search for
+ * @n: znode branch slot number is returned here
+ *
+ * This is a helper function which search branch with key @key in @znode using
+ * binary search. The result of the search may be:
+ * o exact match, then %1 is returned, and the slot number of the branch is
+ * stored in @n;
+ * o no exact match, then %0 is returned and the slot number of the left
+ * closest branch is returned in @n.
+ */
+static int search_zbranch(const struct ubifs_info *c,
+ const struct ubifs_znode *znode,
+ const union ubifs_key *key, int *n)
+{
+ int beg = 0, end = znode->child_cnt, uninitialized_var(mid);
+ int uninitialized_var(cmp);
+ const struct ubifs_zbranch *zbr = &znode->zbranch[0];
+
+ ubifs_assert(end > beg);
+
+ while (end > beg) {
+ mid = (beg + end) >> 1;
+ cmp = keys_cmp(c, key, &zbr[mid].key);
+ if (cmp > 0)
+ beg = mid + 1;
+ else if (cmp < 0)
+ end = mid;
+ else {
+ *n = mid;
+ return 1;
+ }
+ }
+
+ *n = end - 1;
+
+ /* The insert point is after *n */
+ ubifs_assert(*n >= -1 && *n < znode->child_cnt);
+ if (*n == -1)
+ ubifs_assert(keys_cmp(c, key, &zbr[0].key) < 0);
+ else
+ ubifs_assert(keys_cmp(c, key, &zbr[*n].key) > 0);
+ if (*n + 1 < znode->child_cnt)
+ ubifs_assert(keys_cmp(c, key, &zbr[*n + 1].key) < 0);
+
+ return 0;
+}
+
+/**
+ * read_znode - read an indexing node from flash and fill znode.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB of the indexing node to read
+ * @offs: node offset
+ * @len: node length
+ * @znode: znode to read to
+ *
+ * This function reads an indexing node from the flash media and fills znode
+ * with the read data. Returns zero in case of success and a negative error
+ * code in case of failure. The read indexing node is validated and if anything
+ * is wrong with it, this function prints complaint messages and returns
+ * %-EINVAL.
+ */
+static int read_znode(struct ubifs_info *c, int lnum, int offs, int len,
+ struct ubifs_znode *znode)
+{
+ int i, err, type, cmp;
+ struct ubifs_idx_node *idx;
+
+ idx = kmalloc(c->max_idx_node_sz, GFP_KERNEL);
+ if (!idx)
+ return -ENOMEM;
+
+ err = ubifs_read_node(c, idx, UBIFS_IDX_NODE, len, lnum, offs);
+ if (err < 0)
+ goto out;
+
+ znode->child_cnt = le16_to_cpu(idx->child_cnt);
+ znode->level = le16_to_cpu(idx->level);
+
+ dbg_tnc("LEB %d:%d, level %d, %d branch",
+ lnum, offs, znode->level, znode->child_cnt);
+
+ if (znode->child_cnt > c->fanout || znode->level > UBIFS_MAX_LEVELS) {
+ dbg_err("current fanout %d, branch count %d",
+ c->fanout, znode->child_cnt);
+ dbg_err("max levels %d, znode level %d",
+ UBIFS_MAX_LEVELS, znode->level);
+ goto out_dump;
+ }
+
+ for (i = 0; i < znode->child_cnt; i++) {
+ const struct ubifs_branch *br = ubifs_idx_branch(c, idx, i);
+ struct ubifs_zbranch *zbr = &znode->zbranch[i];
+
+ key_read(c, &br->key, &zbr->key);
+ zbr->lnum = le32_to_cpu(br->lnum);
+ zbr->offs = le32_to_cpu(br->offs);
+ zbr->len = le32_to_cpu(br->len);
+ zbr->znode = NULL;
+
+ /* Validate branch */
+
+ if (unlikely(zbr->lnum < c->main_first ||
+ zbr->lnum >= c->leb_cnt || zbr->offs < 0 ||
+ zbr->offs + zbr->len > c->leb_size ||
+ zbr->offs & 7)) {
+ dbg_err("bad branch %d", i);
+ goto out_dump;
+ }
+
+ switch (key_type(c, &zbr->key)) {
+ case UBIFS_INO_KEY:
+ case UBIFS_DATA_KEY:
+ case UBIFS_DENT_KEY:
+ case UBIFS_XENT_KEY:
+ break;
+ default:
+ dbg_key(c, &zbr->key, "bad key type at slot %d: ", i);
+ goto out_dump;
+ }
+
+ if (znode->level)
+ continue;
+
+ type = key_type(c, &zbr->key);
+ if (c->ranges[type].max_len == 0) {
+ if (unlikely(zbr->len != c->ranges[type].len)) {
+ dbg_err("bad target node (type %d) length (%d)",
+ type, zbr->len);
+ dbg_err("have to be %d", c->ranges[type].len);
+ goto out_dump;
+ }
+ } else if (unlikely(zbr->len < c->ranges[type].min_len ||
+ zbr->len > c->ranges[type].max_len)) {
+ dbg_err("bad target node (type %d) length (%d)",
+ type, zbr->len);
+ dbg_err("have to be in range of %d-%d",
+ c->ranges[type].min_len,
+ c->ranges[type].max_len);
+ goto out_dump;
+ }
+ }
+
+ /*
+ * Ensure that the next key is greater or equivalent to the
+ * previous one.
+ */
+ for (i = 0; i < znode->child_cnt - 1; i++) {
+ const union ubifs_key *key1, *key2;
+
+ key1 = &znode->zbranch[i].key;
+ key2 = &znode->zbranch[i + 1].key;
+
+ cmp = keys_cmp(c, key1, key2);
+ if (cmp > 0) {
+ dbg_err("bad key order (keys %d and %d)", i, i + 1);
+ goto out_dump;
+ } else if (cmp == 0 && !is_hash_key(c, key1)) {
+ /* These can only be keys with colliding hash */
+ dbg_err("keys %d and %d are not hashed but equivalent",
+ i, i + 1);
+ goto out_dump;
+ }
+ }
+
+ kfree(idx);
+ return 0;
+
+out:
+ kfree(idx);
+ return err;
+
+out_dump:
+ ubifs_err("bad indexing node at LEB %d:%d", lnum, offs);
+ dbg_dump_node(c, idx);
+ kfree(idx);
+ return -EINVAL;
+}
+
+/**
+ * load_znode - load znode to TNC cache.
+ * @c: UBIFS file-system description object
+ * @zbr: znode branch
+ * @parent: znode's parent
+ * @iip: index in parent
+ *
+ * This function loads znode pointed to by @zbr into the TNC cache and
+ * returns pointer to it in case of success and a negative error code in case
+ * of failure.
+ */
+static struct ubifs_znode *load_znode(struct ubifs_info *c,
+ struct ubifs_zbranch *zbr,
+ struct ubifs_znode *parent, int iip)
+{
+ int err;
+ struct ubifs_znode *znode;
+
+ ubifs_assert(!zbr->znode);
+ /*
+ * A slab cache is not presently used for znodes because the znode size
+ * depends on the fanout which is stored in the superblock.
+ */
+ znode = kzalloc(c->max_znode_sz, GFP_NOFS);
+ if (!znode)
+ return ERR_PTR(-ENOMEM);
+
+ err = read_znode(c, zbr->lnum, zbr->offs, zbr->len, znode);
+ if (err)
+ goto out;
+
+ atomic_long_inc(&c->clean_zn_cnt);
+
+ /*
+ * Increment the global clean znode counter as well. It is OK that
+ * global and per-FS clean znode counters may be inconsistent for some
+ * short time (because we might be preempted at this point), the global
+ * one is only used in shrinker.
+ */
+ atomic_long_inc(&ubifs_clean_zn_cnt);
+
+ zbr->znode = znode;
+ znode->parent = parent;
+ znode->time = get_seconds();
+ znode->iip = iip;
+
+ return znode;
+
+out:
+ kfree(znode);
+ return ERR_PTR(err);
+}
+
+/**
+ * copy_znode - copy a dirty znode.
+ * @c: UBIFS file-system description object
+ * @znode: znode to copy
+ *
+ * A dirty znode being committed may not be changed, so it is copied.
+ */
+static struct ubifs_znode *copy_znode(struct ubifs_info *c,
+ struct ubifs_znode *znode)
+{
+ struct ubifs_znode *zn;
+
+ zn = kzalloc(c->max_znode_sz, GFP_NOFS);
+ if (!zn)
+ return ERR_PTR(-ENOMEM);
+
+ memcpy(zn, znode, c->max_znode_sz);
+
+ ubifs_assert(!test_bit(OBSOLETE_ZNODE, &znode->flags));
+ set_bit(OBSOLETE_ZNODE, &znode->flags);
+
+ if (znode->level != 0) {
+ int i;
+ const int n = zn->child_cnt;
+
+ /* The children now have new parent */
+ for (i = 0; i < n; i++) {
+ struct ubifs_zbranch *zbr = &zn->zbranch[i];
+
+ if (zbr->znode)
+ zbr->znode->parent = zn;
+ }
+ }
+
+ zn->cnext = NULL;
+ set_bit(DIRTY_ZNODE, &zn->flags);
+ clear_bit(COW_ZNODE, &zn->flags);
+ atomic_long_inc(&c->dirty_zn_cnt);
+
+ return zn;
+}
+
+/**
+ * add_idx_dirt - add dirt due to a dirty znode.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number of index node
+ * @dirt: size of index node
+ *
+ * This function updates lprops dirty space and the new size of the index.
+ */
+static int add_idx_dirt(struct ubifs_info *c, int lnum, int dirt)
+{
+ c->calc_idx_sz -= ALIGN(dirt, 8);
+ return ubifs_add_dirt(c, lnum, dirt);
+}
+
+/**
+ * dirty_cow_znode - ensure a znode is not being committed.
+ * @c: UBIFS file-system description object
+ * @zbr: branch of znode to check
+ *
+ * Returns dirtied znode on success or negative error code on failure.
+ */
+static struct ubifs_znode *dirty_cow_znode(struct ubifs_info *c,
+ struct ubifs_zbranch *zbr)
+{
+ struct ubifs_znode *znode = zbr->znode;
+ struct ubifs_znode *zn;
+ int err;
+
+ if (!test_bit(COW_ZNODE, &znode->flags)) {
+ /* znode is not being committed */
+ if (!test_and_set_bit(DIRTY_ZNODE, &znode->flags)) {
+ atomic_long_inc(&c->dirty_zn_cnt);
+ atomic_long_dec(&c->clean_zn_cnt);
+ atomic_long_dec(&ubifs_clean_zn_cnt);
+ err = add_idx_dirt(c, zbr->lnum, zbr->len);
+ if (err)
+ return ERR_PTR(err);
+ }
+ return znode;
+ }
+
+ zn = copy_znode(c, znode);
+ if (IS_ERR(zn))
+ return zn;
+
+ if (zbr->len) {
+ err = insert_old_idx(c, zbr->lnum, zbr->offs);
+ if (err)
+ return ERR_PTR(err);
+
+ err = add_idx_dirt(c, zbr->lnum, zbr->len);
+ } else
+ err = 0;
+
+ zbr->znode = zn;
+ zbr->lnum = 0;
+ zbr->offs = 0;
+ zbr->len = 0;
+
+ if (err)
+ return ERR_PTR(err);
+
+ return zn;
+}
+
+/**
+ * lookup_level0 - search for zero-level znode
+ * @c: UBIFS file-system description object
+ * @key: key to lookup
+ * @zn: znode is returned here
+ * @n: znode branch slot number is returned here
+ *
+ * This function looks up the TNC tree and search for zero-level znode which
+ * refers key @key. The found zero-level znode is returned in @zn. There are 3
+ * cases:
+ * o exact match, i.e. the found zero-level znode contains key @key, then %1
+ * is returned and slot number of the matched branch is stored in @n;
+ * o not exact match, which means that zero-level znode does not contain @key
+ * then %0 is returned and slot number of the closed branch is stored in
+ * @n;
+ * o @key is so small that it is even less then the lowest key of the
+ * leftmost zero-level node, then %0 is returned and %0 is stored in @n.
+ *
+ * Note, when the TNC tree is traversed, some znodes may be absent, then this
+ * function reads corresponding indexing nodes and inserts them to TNC.. In
+ * case of failure, a negative error code is returned.
+ */
+static int lookup_level0(struct ubifs_info *c, const union ubifs_key *key,
+ struct ubifs_znode **zn, int *n)
+{
+ int exact;
+ struct ubifs_znode *znode;
+ unsigned long time = get_seconds();
+
+ dbg_tnc_key(c, key, "search key");
+
+ znode = c->zroot.znode;
+ if (unlikely(!znode)) {
+ znode = load_znode(c, &c->zroot, NULL, 0);
+ if (IS_ERR(znode))
+ return PTR_ERR(znode);
+ }
+
+ znode->time = time;
+
+ while (1) {
+ struct ubifs_zbranch *zbr;
+
+ /*
+ * The below is a debugging hack to make UBIFS eat RAM and
+ * cause fake memory pressure. It is compiled out if it is not
+ * enabled in kernel configuration.
+ */
+ dbg_eat_memory();
+
+ exact = search_zbranch(c, znode, key, n);
+
+ if (znode->level == 0)
+ break;
+
+ if (*n < 0)
+ *n = 0;
+ zbr = &znode->zbranch[*n];
+
+ dbg_tnc_key(c, &zbr->key, "at lvl %d, next zbr %d, key",
+ znode->level, *n);
+
+ if (zbr->znode) {
+ znode->time = time;
+ znode = zbr->znode;
+ continue;
+ }
+
+ /* znode is not in TNC cache, load it from the media */
+ znode = load_znode(c, zbr, znode, *n);
+ if (IS_ERR(znode))
+ return PTR_ERR(znode);
+ }
+
+ *zn = znode;
+ ubifs_assert(exact >= 0 && exact < c->fanout);
+ return exact;
+}
+
+/**
+ * lookup_level0_dirty - search for zero-level znode dirtying
+ * @c: UBIFS file-system description object
+ * @key: key to lookup
+ * @zn: znode is returned here
+ * @n: znode branch slot number is returned here
+ *
+ * This function looks up the TNC tree and search for zero-level znode which
+ * refers key @key. The found zero-level znode is returned in @zn. There are 3
+ * cases:
+ * o exact match, i.e. the found zero-level znode contains key @key, then %1
+ * is returned and slot number of the matched branch is stored in @n;
+ * o not exact match, which means that zero-level znode does not contain @key
+ * then %0 is returned and slot number of the closed branch is stored in
+ * @n;
+ * o @key is so small that it is even less then the lowest key of the
+ * leftmost zero-level node, then %0 is returned and %0 is stored in @n.
+ *
+ * Additionally all znodes in the path from the root to the located zero-level
+ * znode are marked as dirty.
+ *
+ * Note, when the TNC tree is traversed, some znodes may be absent, then this
+ * function reads corresponding indexing nodes and inserts them to TNC.. In
+ * case of failure, a negative error code is returned.
+ */
+static int lookup_level0_dirty(struct ubifs_info *c, const union ubifs_key *key,
+ struct ubifs_znode **zn, int *n)
+{
+ int exact;
+ struct ubifs_znode *znode;
+ unsigned long time = get_seconds();
+
+ dbg_tnc_key(c, key, "search and dirty key");
+
+ znode = c->zroot.znode;
+ if (unlikely(!znode)) {
+ znode = load_znode(c, &c->zroot, NULL, 0);
+ if (IS_ERR(znode))
+ return PTR_ERR(znode);
+ }
+
+ znode = dirty_cow_znode(c, &c->zroot);
+ if (IS_ERR(znode))
+ return PTR_ERR(znode);
+
+ znode->time = time;
+
+ while (1) {
+ struct ubifs_zbranch *zbr;
+
+ /*
+ * The below is a debugging hack to make UBIFS eat RAM and
+ * cause fake memory pressure. It is compiled out if it is not
+ * enabled in kernel configuration.
+ */
+ dbg_eat_memory();
+
+ exact = search_zbranch(c, znode, key, n);
+
+ if (znode->level == 0)
+ break;
+
+ if (*n < 0)
+ *n = 0;
+ zbr = &znode->zbranch[*n];
+
+ dbg_tnc_key(c, &zbr->key, "at lvl %d, next zbr %d, key",
+ znode->level, *n);
+
+ if (zbr->znode) {
+ znode->time = time;
+ znode = dirty_cow_znode(c, zbr);
+ if (IS_ERR(znode))
+ return PTR_ERR(znode);
+ continue;
+ }
+
+ /* znode is not in TNC cache, load it from the media */
+ znode = load_znode(c, zbr, znode, *n);
+ if (IS_ERR(znode))
+ return PTR_ERR(znode);
+ znode = dirty_cow_znode(c, zbr);
+ if (IS_ERR(znode))
+ return PTR_ERR(znode);
+ }
+
+ *zn = znode;
+ ubifs_assert(exact >= 0 && exact < c->fanout);
+ return exact;
+}
+
+/**
+ * lnc_lookup - lookup the leaf-node-cache.
+ * @c: UBIFS file-system description object
+ * @zbr: zbranch of leaf node
+ * @node: leaf node
+ *
+ * Leaf nodes are non-index nodes like dent (directory entry) nodes or data
+ * nodes. The purpose of the leaf-node-cache is to save re-reading the same
+ * leaf node over and over again. Most things are cached by VFS, however the
+ * file system must cache directory entries for readdir and for resolving hash
+ * collisions. The present implementation of the leaf-node-cache is extremely
+ * simple, and allows for error returns that are not used but that may be needed
+ * if a more complex implementation is created.
+ *
+ * This function returns %1 if the leaf node is in the cache, %0 if it is not,
+ * and a negative error code otherwise.
+ */
+static int lnc_lookup(struct ubifs_info *c, struct ubifs_zbranch *zbr,
+ void *node)
+{
+ if (zbr->leaf == NULL)
+ return 0;
+ ubifs_assert(zbr->len != 0);
+ memcpy(node, zbr->leaf, zbr->len);
+ return 1;
+}
+
+/**
+ * ubifs_validate_entry - validate directory or extended attribute entry node.
+ * @c: UBIFS file-system description object
+ * @dent: the node to validate
+ *
+ * This function validates directory or extended attribute entry node @dent.
+ * Returns zero if the node is all right and a %-EINVAL if not.
+ */
+int ubifs_validate_entry(struct ubifs_info *c,
+ const struct ubifs_dent_node *dent)
+{
+ int key_type, nlen = le16_to_cpu(dent->nlen);
+
+ if (le32_to_cpu(dent->ch.len) != nlen + UBIFS_DENT_NODE_SZ + 1 ||
+ dent->type >= UBIFS_ITYPES_CNT ||
+ nlen > UBIFS_MAX_NLEN || dent->name[nlen] != 0 ||
+ strnlen(dent->name, nlen) != nlen ||
+ le64_to_cpu(dent->inum) > MAX_INUM) {
+ const char *node_type;
+
+ if (key_type_flash(c, dent->key) == UBIFS_DENT_KEY)
+ node_type = "directory entry";
+ else
+ node_type = "extended attribute entry";
+
+ ubifs_err("bad %s node", node_type);
+ return -EINVAL;
+ }
+
+ key_type = key_type_flash(c, dent->key);
+ if (key_type_flash(c, dent->key) != UBIFS_DENT_KEY &&
+ key_type_flash(c, dent->key) != UBIFS_XENT_KEY) {
+ ubifs_err("bad key type %d", key_type);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/**
+ * lnc_add - add a leaf node to the leaf-node-cache.
+ * @c: UBIFS file-system description object
+ * @zbr: zbranch of leaf node
+ * @node: leaf node
+ *
+ * This function returns %0 to indicate success and a negative error code
+ * otherwise.
+ */
+static int lnc_add(struct ubifs_info *c, struct ubifs_zbranch *zbr,
+ const void *node)
+{
+ int err;
+ void *lnc_node;
+ const struct ubifs_dent_node *dent = node;
+
+ ubifs_assert(zbr->leaf == NULL);
+ ubifs_assert(zbr->len != 0);
+
+ /* Add all dents, but nothing else */
+ if (key_type(c, &zbr->key) != UBIFS_DENT_KEY)
+ return 0;
+
+ err = ubifs_validate_entry(c, dent);
+ if (err) {
+ dbg_dump_node(c, dent);
+ return err;
+ }
+
+ lnc_node = kmalloc(zbr->len, GFP_NOFS);
+ if (!lnc_node)
+ return 0; /* We don't have to have the cache, so no error */
+
+ memcpy(lnc_node, node, zbr->len);
+ zbr->leaf = lnc_node;
+ return 0;
+}
+
+/**
+ * lnc_free - remove a leaf node from the leaf-node-cache.
+ * @zbr: zbranch of leaf node
+ * @node: leaf node
+ *
+ * This function returns %0 to indicate success and a negative error code
+ * otherwise.
+ */
+static void lnc_free(struct ubifs_zbranch *zbr)
+{
+ if (zbr->leaf == NULL)
+ return;
+ kfree(zbr->leaf);
+ zbr->leaf = NULL;
+}
+
+/**
+ * tnc_read_node - read a leaf node.
+ * @c: UBIFS file-system description object
+ * @zbr: key and position of node
+ * @node: node returned
+ *
+ * This function reads a node or returns a negative error code.
+ */
+static int tnc_read_node(struct ubifs_info *c, struct ubifs_zbranch *zbr,
+ void *node)
+{
+ union ubifs_key key1, *key = &zbr->key;
+ int err, type = key_type(c, key);
+ const struct ubifs_bud *bud;
+
+ dbg_tnc_key(c, key, "LEB %d:%d, len %d, key",
+ zbr->lnum, zbr->offs, zbr->len);
+
+ if (lnc_lookup(c, zbr, node))
+ return 0; /* Read from the leaf-node-cache */
+ /*
+ * 'zbr' has to point to on-flash node. The node may sit in a bud and
+ * may even be in a write buffer, so we have to take care about this.
+ */
+ if (c->jheads)
+ bud = ubifs_search_bud(c, zbr->lnum);
+ else
+ bud = NULL;
+ if (bud)
+ /* The bud can't go because we are under @c->commit_sem */
+ err = ubifs_read_node_wbuf(&c->jheads[bud->jhead].wbuf,
+ node, type, zbr->len, zbr->lnum,
+ zbr->offs);
+ else
+ err = ubifs_read_node(c, node, type, zbr->len, zbr->lnum,
+ zbr->offs);
+
+ if (err) {
+ dbg_tnc_key(c, key, "key");
+ return err;
+ }
+
+ /* Make sure the key of the read node is correct */
+ key_read(c, key, &key1);
+ if (memcmp(node + UBIFS_CH_SZ, &key1, c->key_len)) {
+ ubifs_err("bad key in node at LEB %d:%d",
+ zbr->lnum, zbr->offs);
+ dbg_tnc_key(c, key, "looked for key");
+ dbg_tnc_key(c, &key1, "found node's key");
+ dbg_dump_node(c, node);
+ return err;
+ }
+
+ /* Consider adding the node to the leaf node cache */
+ err = lnc_add(c, zbr, node);
+ return err;
+}
+
+/**
+ * ubifs_try_read_node - read a node if it is a node.
+ * @c: UBIFS file-system description object
+ * @buf: buffer to read to
+ * @type: node type
+ * @len: node length (not aligned)
+ * @lnum: LEB number of node to read
+ * @offs: offset of node to read
+ *
+ * This function tries to read a node of known type and length, checks it and
+ * stores it in @buf. This function returns %1 if a node is present and %0 if
+ * a node is not present. A negative error code is returned for I/O errors.
+ * This function performs that same function as ubifs_read_node except that
+ * it does not require that there is actually a node present and instead
+ * the return code indicates if a node was read.
+ */
+static int try_read_node(const struct ubifs_info *c, void *buf, int type,
+ int len, int lnum, int offs)
+{
+ int err, node_len;
+ struct ubifs_ch *ch = buf;
+ uint32_t crc, node_crc;
+
+ dbg_io("LEB %d:%d, %s, length %d", lnum, offs, dbg_ntype(type), len);
+ ubifs_assert(lnum >= 0 && lnum < c->leb_cnt && offs >= 0);
+ ubifs_assert(len >= UBIFS_CH_SZ && offs + len <= c->leb_size);
+ ubifs_assert(!(offs & 7) && offs < c->leb_size);
+ ubifs_assert(type >= 0 && type < UBIFS_NODE_TYPES_CNT);
+
+ err = ubi_read(c->ubi, lnum, buf, offs, len);
+ if (err) {
+ ubifs_err("cannot read node type %d from LEB %d:%d, error %d",
+ type, lnum, offs, err);
+ return err;
+ }
+
+ if (le32_to_cpu(ch->magic) != UBIFS_NODE_MAGIC)
+ return 0;
+
+ if (ch->node_type != type)
+ return 0;
+
+ node_len = le32_to_cpu(ch->len);
+ if (node_len != len)
+ return 0;
+
+ crc = crc32(UBIFS_CRC32_INIT, buf + 8, node_len - 8);
+ node_crc = le32_to_cpu(ch->crc);
+ if (crc != node_crc)
+ return 0;
+
+ return 1;
+}
+
+/**
+ * fallible_read_node - try to read a leaf node.
+ * @c: UBIFS file-system description object
+ * @key: key of node to read
+ * @zbr: position of node
+ * @node: node returned
+ *
+ * This function tries to read a node and returns %1 if the node is read, %0
+ * if the node is not present, and a negative error code in the case of error.
+ */
+static int fallible_read_node(struct ubifs_info *c, const union ubifs_key *key,
+ struct ubifs_zbranch *zbr, void *node)
+{
+ int ret;
+
+ dbg_tnc_key(c, key, "key");
+
+ if (lnc_lookup(c, zbr, node))
+ return 0; /* Read from the leaf-node-cache */
+
+ ret = try_read_node(c, node, key_type(c, key), zbr->len, zbr->lnum,
+ zbr->offs);
+ if (ret == 1) {
+ union ubifs_key node_key;
+
+ /* All nodes have key in the same place */
+ key_read(c, &((struct ubifs_dent_node *)node)->key, &node_key);
+ if (keys_cmp(c, key, &node_key) == 0) {
+ /* Consider adding the node to the leaf node cache */
+ int err = lnc_add(c, zbr, node);
+
+ if (err)
+ return err;
+ } else
+ ret = 0;
+ } else if (ret == 0)
+ dbg_gc_key(c, key, "dangling branch LEB %d:%d len %d, key",
+ zbr->lnum, zbr->offs, zbr->len);
+ return ret;
+}
+
+/**
+ * matches_name - determine if a directory or extended attribute entry matches
+ * a given name.
+ * @c: UBIFS file-system description object
+ * @zt: zbranch of dent
+ * @nm: name to match
+ *
+ * This function returns %1 if the name matches, %0 if the name does not match
+ * and a negative error code otherwise.
+ */
+static int matches_name(struct ubifs_info *c, struct ubifs_zbranch *zt,
+ const struct qstr *nm)
+{
+ struct ubifs_dent_node *dent;
+ int nlen, err;
+
+ /* If possible, match against the dent in the leaf-node-cache */
+ dent = zt->leaf;
+ if (dent) {
+ nlen = le16_to_cpu(dent->nlen);
+
+ if (nlen == nm->len && !memcmp(dent->name, nm->name, nlen))
+ return 1;
+ return 0;
+ }
+
+ dent = kmalloc(zt->len, GFP_NOFS);
+ if (!dent)
+ return -ENOMEM;
+ /*
+ * In this case we end up allocating another dent object in lnc_add(),
+ * although it could have just inserted this dent.
+ */
+ err = tnc_read_node(c, zt, dent);
+ if (!err) {
+ err = ubifs_validate_entry(c, dent);
+ if (err) {
+ dbg_dump_node(c, dent);
+ goto out;
+ }
+
+ nlen = le16_to_cpu(dent->nlen);
+ if (nlen == nm->len && !memcmp(dent->name, nm->name, nlen))
+ err = 1;
+ }
+
+out:
+ kfree(dent);
+ return err;
+}
+
+/**
+ * get_znode - get a TNC znode that may not be loaded yet.
+ * @c: UBIFS file-system description object
+ * @znode: parent znode
+ * @n: znode branch slot number
+ *
+ * This function returns the znode or a negative error code.
+ */
+static struct ubifs_znode *get_znode(struct ubifs_info *c,
+ struct ubifs_znode *znode, int n)
+{
+ struct ubifs_zbranch *zbr;
+
+ zbr = &znode->zbranch[n];
+ if (zbr->znode)
+ znode = zbr->znode;
+ else
+ znode = load_znode(c, zbr, znode, n);
+ return znode;
+}
+
+/**
+ * tnc_next - find next TNC entry.
+ * @c: UBIFS file-system description object
+ * @zn: znode is passed and returned here
+ * @nn: znode branch slot number is passed and returned here
+ *
+ * This function returns %0 if the next TNC entry is found, %-ENOENT if there is
+ * no next entry, or a negative error code otherwise.
+ */
+static int tnc_next(struct ubifs_info *c, struct ubifs_znode **zn, int *nn)
+{
+ struct ubifs_znode *znode = *zn;
+ int n = *nn;
+
+ n += 1;
+ if (n < znode->child_cnt) {
+ *nn = n;
+ return 0;
+ }
+ while (1) {
+ struct ubifs_znode *zp;
+
+ zp = znode->parent;
+ if (!zp)
+ return -ENOENT;
+ n = znode->iip + 1;
+ znode = zp;
+ if (n < znode->child_cnt) {
+ znode = get_znode(c, znode, n);
+ if (IS_ERR(znode))
+ return PTR_ERR(znode);
+ while (znode->level != 0) {
+ znode = get_znode(c, znode, 0);
+ if (IS_ERR(znode))
+ return PTR_ERR(znode);
+ }
+ n = 0;
+ break;
+ }
+ }
+ *zn = znode;
+ *nn = n;
+ return 0;
+}
+
+/**
+ * tnc_prev - find previous TNC entry.
+ * @c: UBIFS file-system description object
+ * @zn: znode is returned here
+ * @nn: znode branch slot number is passed and returned here
+ *
+ * This function returns %0 if the previous TNC entry is found, %-ENOENT if
+ * there is no next entry, or a negative error code otherwise.
+ */
+static int tnc_prev(struct ubifs_info *c, struct ubifs_znode **zn, int *nn)
+{
+ struct ubifs_znode *znode = *zn;
+ int n = *nn;
+
+ if (n > 0) {
+ *nn = n - 1;
+ return 0;
+ }
+ while (1) {
+ struct ubifs_znode *zp;
+
+ zp = znode->parent;
+ if (!zp)
+ return -ENOENT;
+ n = znode->iip - 1;
+ znode = zp;
+ if (n >= 0) {
+ znode = get_znode(c, znode, n);
+ if (IS_ERR(znode))
+ return PTR_ERR(znode);
+ while (znode->level != 0) {
+ n = znode->child_cnt - 1;
+ znode = get_znode(c, znode, n);
+ if (IS_ERR(znode))
+ return PTR_ERR(znode);
+ }
+ n = znode->child_cnt - 1;
+ break;
+ }
+ }
+ *zn = znode;
+ *nn = n;
+ return 0;
+}
+
+/**
+ * resolve_collision - resolve a collision.
+ * @c: UBIFS file-system description object
+ * @key: key of a directory or extended attribute entry
+ * @zn: znode is returned here
+ * @nn: znode branch slot number is passed and returned here
+ * @nm: name of the entry
+ *
+ * This function returns %1 and sets @zn and @nn if the collision is resolved.
+ * %0 is returned if @nm is not found and @zn and @nn are set to the next
+ * entry. %-ENOENT is returned if there are no following entries for the same
+ * inode. Otherwise a negative error code is returned.
+ */
+static int resolve_collision(struct ubifs_info *c, const union ubifs_key *key,
+ struct ubifs_znode **zn, int *nn,
+ const struct qstr *nm)
+{
+ struct ubifs_znode *znode;
+ union ubifs_key *okey;
+ int n, err;
+
+ dbg_tnc_key(c, key, "key");
+
+ znode = *zn;
+ n = *nn;
+ err = matches_name(c, &znode->zbranch[n], nm);
+ if (err < 0)
+ return err;
+ if (err == 1)
+ return 1;
+
+ /* Look left */
+ while (1) {
+ err = tnc_prev(c, &znode, &n);
+ if (err == -ENOENT)
+ break;
+ if (err)
+ return err;
+ if (keys_cmp(c, &znode->zbranch[n].key, key))
+ break;
+ err = matches_name(c, &znode->zbranch[n], nm);
+ if (err < 0)
+ return err;
+ if (err == 1) {
+ dbg_tnc_key(c, key, "collision resolved");
+ *zn = znode;
+ *nn = n;
+ return 1;
+ }
+ }
+
+ /* Look right */
+ znode = *zn;
+ n = *nn;
+ while (1) {
+ err = tnc_next(c, &znode, &n);
+ if (err)
+ return err;
+ okey = &znode->zbranch[n].key;
+ if (keys_cmp(c, okey, key))
+ return -ENOENT;
+ err = matches_name(c, &znode->zbranch[n], nm);
+ if (err < 0)
+ return err;
+ if (err == 1) {
+ dbg_tnc_key(c, key, "collision resolved");
+ *zn = znode;
+ *nn = n;
+ return 1;
+ }
+ }
+
+ return -EINVAL;
+}
+
+/**
+ * fallible_matches_name - determine if a dent matches a given name.
+ * @c: UBIFS file-system description object
+ * @zt: zbranch of dent
+ * @nm: name to match
+ *
+ * This function returns %1 if the name matches, %0 if the name does not match,
+ * %2 if the node was not present, and a negative error code otherwise.
+ */
+static int fallible_matches_name(struct ubifs_info *c, struct ubifs_zbranch *zt,
+ const struct qstr *nm)
+{
+ struct ubifs_dent_node *dent;
+ int nlen, err;
+
+ /* If possible, match against the dent in the leaf-node-cache */
+ dent = zt->leaf;
+ if (dent) {
+ nlen = le16_to_cpu(dent->nlen);
+
+ if (nlen == nm->len && !memcmp(dent->name, nm->name, nlen))
+ return 1;
+ return 0;
+ }
+
+ dent = kmalloc(zt->len, GFP_NOFS);
+ if (!dent)
+ return -ENOMEM;
+ /*
+ * In this case we end up allocating another dent object in lnc_add(),
+ * although it could have just inserted this dent.
+ */
+ err = fallible_read_node(c, &zt->key, zt, dent);
+ if (err < 0)
+ goto out;
+ if (err == 0) {
+ err = 2; /* The node was not present */
+ goto out;
+ }
+ if (err == 1) {
+ err = ubifs_validate_entry(c, dent);
+ if (err) {
+ dbg_dump_node(c, dent);
+ goto out;
+ }
+
+ nlen = le16_to_cpu(dent->nlen);
+ if (nlen == nm->len && !memcmp(dent->name, nm->name, nlen))
+ err = 1;
+ else
+ err = 0;
+ }
+out:
+ kfree(dent);
+ return err;
+}
+
+/**
+ * fallible_resolve_collision - resolve a collision even if nodes are missing.
+ * @c: UBIFS file-system description object
+ * @key: key of directory entry
+ * @zn: znode is returned here
+ * @nn: znode branch slot number is passed and returned here
+ * @nm: name of directory entry
+ *
+ * This function returns %1 and sets @zn and @nn if the collision is resolved.
+ * %0 is returned if @nm is not found and @zn and @nn are set to the
+ * next directory entry. %-ENOENT is returned if there are no
+ * following directory entries for the same inode. Otherwise a negative error
+ * code is returned.
+ */
+static int fallible_resolve_collision(struct ubifs_info *c,
+ const union ubifs_key *key,
+ struct ubifs_znode **zn, int *nn,
+ const struct qstr *nm)
+{
+ struct ubifs_znode *znode, *o_znode = NULL;
+ union ubifs_key *okey;
+ int n, o_n = 0, err;
+
+ dbg_tnc_key(c, key, "key");
+ znode = *zn;
+ n = *nn;
+ err = fallible_matches_name(c, &znode->zbranch[n], nm);
+ if (err < 0)
+ return err;
+ if (err == 1)
+ return 1;
+ if (err == 2) {
+ o_znode = znode;
+ o_n = n;
+ }
+
+ /* Look left */
+ while (1) {
+ err = tnc_prev(c, &znode, &n);
+ if (err == -ENOENT)
+ break;
+ if (err)
+ return err;
+ if (keys_cmp(c, &znode->zbranch[n].key, key))
+ break;
+ err = fallible_matches_name(c, &znode->zbranch[n], nm);
+ if (err < 0)
+ return err;
+ if (err == 1) {
+ dbg_tnc_key(c, key, "collision resolved");
+ *zn = znode;
+ *nn = n;
+ return 1;
+ }
+ if (err == 2) {
+ o_znode = znode;
+ o_n = n;
+ }
+ }
+ /* Look right */
+ znode = *zn;
+ n = *nn;
+ while (1) {
+ err = tnc_next(c, &znode, &n);
+ if (err == -ENOENT && o_znode) {
+ dbg_tnc_key(c, key, "collision resolved by default");
+ dbg_gc_key(c, key, "dangling match LEB %d:%d len %d ",
+ o_znode->zbranch[o_n].lnum,
+ o_znode->zbranch[o_n].offs,
+ o_znode->zbranch[o_n].len);
+ *zn = o_znode;
+ *nn = o_n;
+ return 1;
+ }
+ if (err)
+ return err;
+ okey = &znode->zbranch[n].key;
+ if (keys_cmp(c, okey, key)) {
+ if (!o_znode)
+ return -ENOENT;
+ dbg_tnc_key(c, key, "collision resolved by default");
+ dbg_gc_key(c, key, "dangling match LEB %d:%d len %d ",
+ o_znode->zbranch[o_n].lnum,
+ o_znode->zbranch[o_n].offs,
+ o_znode->zbranch[o_n].len);
+ *zn = o_znode;
+ *nn = o_n;
+ return 1;
+ }
+ err = fallible_matches_name(c, &znode->zbranch[n], nm);
+ if (err < 0)
+ return err;
+ if (err == 1) {
+ dbg_tnc_key(c, key, "collision resolved");
+ *zn = znode;
+ *nn = n;
+ return 1;
+ }
+ if (err == 2) {
+ o_znode = znode;
+ o_n = n;
+ }
+ }
+ return -EINVAL;
+}
+
+/**
+ * matches_position - determine if a zbranch matches a given position.
+ * @zt: zbranch of dent
+ * @lnum: LEB number of dent to match
+ * @offs: offset of dent to match
+ *
+ * This function returns %1 if @lnum:@offs matches, and %0 otherwise.
+ */
+static int matches_position(struct ubifs_zbranch *zt, int lnum, int offs)
+{
+ if (zt->lnum == lnum && zt->offs == offs)
+ return 1;
+ else
+ return 0;
+}
+
+/**
+ * resolve_collision_directly - resolve a collision directly.
+ * @c: UBIFS file-system description object
+ * @key: key of directory entry
+ * @zn: znode is passed and returned here
+ * @nn: znode branch slot number is passed and returned here
+ * @lnum: LEB number of dent node to match
+ * @offs: offset of dent node to match
+ *
+ * This function returns %1 and sets @zn and @nn if the collision is resolved.
+ * %0 is returned if @lnum:@offs is not found and @zn and @nn are set to the
+ * next directory entry. %-ENOENT is returned if there are no
+ * following directory entries for the same inode. Otherwise a negative error
+ * code is returned.
+ */
+static int resolve_collision_directly(struct ubifs_info *c,
+ const union ubifs_key *key,
+ struct ubifs_znode **zn, int *nn,
+ int lnum, int offs)
+{
+ struct ubifs_znode *znode;
+ union ubifs_key *okey;
+ int n, err;
+
+ dbg_tnc_key(c, key, "key");
+ dbg_mnt_key(c, key, "LEB %d:%d", lnum, offs);
+ znode = *zn;
+ n = *nn;
+ if (matches_position(&znode->zbranch[n], lnum, offs))
+ return 1;
+
+ /* Look left */
+ while (1) {
+ err = tnc_prev(c, &znode, &n);
+ if (err == -ENOENT)
+ break;
+ if (err)
+ return err;
+ if (keys_cmp(c, &znode->zbranch[n].key, key))
+ break;
+ if (matches_position(&znode->zbranch[n], lnum, offs)) {
+ dbg_tnc_key(c, key, "collision resolved");
+ dbg_mnt_key(c, key, "LEB %d:%d collision resolved",
+ lnum, offs);
+ *zn = znode;
+ *nn = n;
+ return 1;
+ }
+ }
+
+ /* Look right */
+ znode = *zn;
+ n = *nn;
+ while (1) {
+ err = tnc_next(c, &znode, &n);
+ if (err)
+ return err;
+ okey = &znode->zbranch[n].key;
+ if (keys_cmp(c, okey, key))
+ return 0;
+ if (matches_position(&znode->zbranch[n], lnum, offs)) {
+ dbg_tnc_key(c, key, "collision resolved");
+ dbg_mnt_key(c, key, "LEB %d:%d collision resolved",
+ lnum, offs);
+ *zn = znode;
+ *nn = n;
+ return 1;
+ }
+ }
+}
+
+/**
+ * ubifs_tnc_lookup - look up a file-system node.
+ * @c: UBIFS file-system description object
+ * @key: node key to lookup
+ * @node: the node is returned here
+ *
+ * This function look up and reads node with key @key. The caller has to make
+ * sure the @node buffer is large enough to fit the node. Returns zero in case
+ * of success, %-ENOENT if the node was not found, and a negative error code in
+ * case of failure.
+ */
+int ubifs_tnc_lookup(struct ubifs_info *c, const union ubifs_key *key,
+ void *node)
+{
+ int found, n, err;
+ struct ubifs_znode *znode;
+ struct ubifs_zbranch zbr, *zt;
+
+ mutex_lock(&c->tnc_mutex);
+ found = lookup_level0(c, key, &znode, &n);
+ if (!found) {
+ err = -ENOENT;
+ goto out;
+ } else if (found < 0) {
+ err = found;
+ goto out;
+ }
+ zt = &znode->zbranch[n];
+ if (is_hash_key(c, key)) {
+ /*
+ * In this case the leaf-node-cache gets used, so we pass the
+ * address of the zbranch and keep the mutex locked
+ */
+ err = tnc_read_node(c, zt, node);
+ goto out;
+ }
+ zbr = znode->zbranch[n];
+ mutex_unlock(&c->tnc_mutex);
+
+ err = tnc_read_node(c, &zbr, node);
+ return err;
+
+out:
+ mutex_unlock(&c->tnc_mutex);
+ return err;
+}
+
+/**
+ * ubifs_tnc_locate - look up a file-system node and return it and its location.
+ * @c: UBIFS file-system description object
+ * @key: node key to lookup
+ * @node: the node is returned here
+ * @lnum: LEB number is returned here
+ * @offs: offset is returned here
+ *
+ * This function is the same as 'ubifs_tnc_lookup()' but it returns the node
+ * location also. See 'ubifs_tnc_lookup()'.
+ */
+int ubifs_tnc_locate(struct ubifs_info *c, const union ubifs_key *key,
+ void *node, int *lnum, int *offs)
+{
+ int found, n, err;
+ struct ubifs_znode *znode;
+ struct ubifs_zbranch zbr, *zt;
+
+ mutex_lock(&c->tnc_mutex);
+ found = lookup_level0(c, key, &znode, &n);
+ if (!found) {
+ err = -ENOENT;
+ goto out;
+ } else if (found < 0) {
+ err = found;
+ goto out;
+ }
+ zt = &znode->zbranch[n];
+ if (is_hash_key(c, key)) {
+ /*
+ * In this case the leaf-node-cache gets used, so we pass the
+ * address of the zbranch and keep the mutex locked
+ */
+ *lnum = zt->lnum;
+ *offs = zt->offs;
+ err = tnc_read_node(c, zt, node);
+ goto out;
+ }
+ zbr = znode->zbranch[n];
+ mutex_unlock(&c->tnc_mutex);
+
+ *lnum = zbr.lnum;
+ *offs = zbr.offs;
+
+ err = tnc_read_node(c, &zbr, node);
+ return err;
+
+out:
+ mutex_unlock(&c->tnc_mutex);
+ return err;
+}
+
+/**
+ * do_lookup_nm- look up a "hashed" node.
+ * directory entry file-system node.
+ * @c: UBIFS file-system description object
+ * @key: node key to lookup
+ * @node: the node is returned here
+ * @nm: node name
+ *
+ * This function look up and reads a node which contains name hash in the key.
+ * Since the hash may have collisions, there may be many nodes with the same
+ * key, so we have to sequentially look to all of them until the needed one is
+ * found. This function returns zero in case of success, %-ENOENT if the node
+ * was not found, and a negative error code in case of failure.
+ */
+static int do_lookup_nm(struct ubifs_info *c, const union ubifs_key *key,
+ void *node, const struct qstr *nm)
+{
+ int found, n, err;
+ struct ubifs_znode *znode;
+ struct ubifs_zbranch zbr;
+
+ dbg_tnc_key(c, key, "key");
+ mutex_lock(&c->tnc_mutex);
+ found = lookup_level0(c, key, &znode, &n);
+ if (!found) {
+ err = -ENOENT;
+ goto out;
+ } else if (found < 0) {
+ err = found;
+ goto out;
+ }
+
+ ubifs_assert(n >= 0);
+
+ err = resolve_collision(c, key, &znode, &n, nm);
+ if (err < 0)
+ goto out;
+ if (err == 0) {
+ err = -ENOENT;
+ goto out;
+ }
+
+ zbr = znode->zbranch[n];
+ mutex_unlock(&c->tnc_mutex);
+
+ err = tnc_read_node(c, &zbr, node);
+
+ return err;
+
+out:
+ mutex_unlock(&c->tnc_mutex);
+ return err;
+}
+
+/**
+ * ubifs_tnc_lookup_nm- look up a "hashed" node.
+ * directory entry file-system node.
+ * @c: UBIFS file-system description object
+ * @key: node key to lookup
+ * @node: the node is returned here
+ * @nm: node name
+ *
+ * This function look up and reads a node which contains name hash in the key.
+ * Since the hash may have collisions, there may be many nodes with the same
+ * key, so we have to sequentially look to all of them until the needed one is
+ * found. This function returns zero in case of success, %-ENOENT if the node
+ * was not found, and a negative error code in case of failure.
+ */
+int ubifs_tnc_lookup_nm(struct ubifs_info *c, const union ubifs_key *key,
+ void *node, const struct qstr *nm)
+{
+ int err, len;
+ const struct ubifs_dent_node *dent = node;
+
+ /*
+ * We assume that in most of the cases there are no name collisions and
+ * 'ubifs_tnc_lookup()' returns us the right direntry.
+ */
+ err = ubifs_tnc_lookup(c, key, node);
+ if (err)
+ return err;
+
+ len = le16_to_cpu(dent->nlen);
+ if (nm->len == len && !memcmp(dent->name, nm->name, len))
+ return 0;
+
+ /*
+ * Unluckily, there are hash collisions and we have to iterate over
+ * them look at each direntry with colliding name hash sequentially.
+ */
+ return do_lookup_nm(c, key, node, nm);
+}
+
+/**
+ * correct_parent_keys - correct parent znodes' keys.
+ * @c: UBIFS file-system description object
+ * @znode: znode to correct parent znodes for
+ *
+ * This is a helper function for 'tnc_insert()'. When the key of the leftmost
+ * zbranch changes, keys of parent znodes have to be corrected. This helper
+ * function is called in such situations and corrects the keys if needed.
+ */
+static void correct_parent_keys(const struct ubifs_info *c,
+ struct ubifs_znode *znode)
+{
+ union ubifs_key *key, *key1;
+
+ ubifs_assert(znode->parent);
+ ubifs_assert(znode->iip == 0);
+
+ key = &znode->zbranch[0].key;
+ key1 = &znode->parent->zbranch[0].key;
+
+ while (keys_cmp(c, key, key1) < 0) {
+ key_copy(c, key, key1);
+ znode = znode->parent;
+ if (!znode->parent || znode->iip)
+ break;
+ key1 = &znode->parent->zbranch[0].key;
+ }
+}
+
+/**
+ * insert_zbranch - insert a zbranch into a znode.
+ * @znode: znode into which to insert
+ * @zbr: zbranch to insert
+ * @n: slot number to insert to
+ *
+ * This is a helper function for 'tnc_insert()'. UBIFS does not allow "gaps" in
+ * znode's array of zbranches and keeps zbranches consolidated, so when a new
+ * zbranch has to be inserted to the @znode->zbranches[]' array at the @n-th
+ * slot, zbranches starting from @n have to be moved right.
+ */
+static void insert_zbranch(struct ubifs_znode *znode,
+ const struct ubifs_zbranch *zbr, int n)
+{
+ int i;
+
+ ubifs_assert(ubifs_zn_dirty(znode));
+
+ if (znode->level) {
+ for (i = znode->child_cnt; i > n; i--) {
+ znode->zbranch[i] = znode->zbranch[i - 1];
+ if (znode->zbranch[i].znode)
+ znode->zbranch[i].znode->iip = i;
+ }
+ if (zbr->znode)
+ zbr->znode->iip = n;
+ } else
+ for (i = znode->child_cnt; i > n; i--)
+ znode->zbranch[i] = znode->zbranch[i - 1];
+
+ znode->zbranch[n] = *zbr;
+ znode->child_cnt += 1;
+ /*
+ * After inserting at slot zero, the lower bound of the key range of
+ * this znode may have changed. If this znode is subsequently split
+ * then the upper bound of the key range may change, and furthermore
+ * it could change to be lower than the original lower bound. If that
+ * happens, then it will no longer be possible to find this znode in the
+ * TNC using the key from the index node on flash. That is bad because
+ * if it is not found, we will assume it is obsolete and may overwrite
+ * it. Then if there is an unclean unmount, we will start using the
+ * old index which will be broken.
+ *
+ * So we first mark znodes that have insertions at slot zero, and then
+ * if they are split we add their lnum/offs to the old_idx tree.
+ */
+ if (n == 0)
+ znode->alt = 1;
+}
+
+/**
+ * tnc_insert - insert a node into TNC.
+ * @c: UBIFS file-system description object
+ * @znode: znode to insert into
+ * @zbr: branch to insert
+ * @n: slot number to insert new zbranch to
+ *
+ * This function inserts a new node described by @zbr into znode @znode. If
+ * znode does not have a free slot for new zbranch, it is split. Parent znodes
+ * are splat as well if needed. Returns zero in case of success or a negative
+ * error code in case of failure.
+ */
+static int tnc_insert(struct ubifs_info *c, struct ubifs_znode *znode,
+ struct ubifs_zbranch *zbr, int n)
+{
+ struct ubifs_znode *zn, *zi, *zp;
+ int i, keep, move, appending = 0;
+ union ubifs_key *key = &zbr->key;
+
+ ubifs_assert(n >= 0 && n <= c->fanout);
+
+ /* Implement naive insert for now */
+again:
+ zp = znode->parent;
+ if (znode->child_cnt < c->fanout) {
+ ubifs_assert(n != c->fanout);
+ dbg_tnc_key(c, key, "inserted at %d level %d, key ", n,
+ znode->level);
+
+ insert_zbranch(znode, zbr, n);
+
+ /* Ensure parent's key is correct */
+ if (n == 0 && zp && znode->iip == 0)
+ correct_parent_keys(c, znode);
+
+ return 0;
+ }
+
+ /*
+ * Unfortunately, @znode does not have more empty slots and we have to
+ * split it.
+ */
+ dbg_tnc_key(c, key, "splitting level %d, key ", znode->level);
+
+ if (znode->alt)
+ /*
+ * We can no longer be sure of finding this znode by key, so we
+ * record it in the old_idx tree.
+ */
+ ins_clr_old_idx_znode(c, znode);
+
+ zn = kzalloc(c->max_znode_sz, GFP_NOFS);
+ if (!zn)
+ return -ENOMEM;
+ zn->parent = zp;
+ zn->level = znode->level;
+
+ /* Decide where to split */
+ if (znode->level == 0 && n == c->fanout &&
+ key_type(c, key) == UBIFS_DATA_KEY) {
+ union ubifs_key *key1;
+
+ /*
+ * If this is an inode which is being appended - do not split
+ * it because no other zbranches can be inserted between
+ * zbranches of consecutive data nodes anyway.
+ */
+ key1 = &znode->zbranch[n - 1].key;
+ if (key_ino(c, key1) == key_ino(c, key) &&
+ key_type(c, key1) == UBIFS_DATA_KEY &&
+ key_block(c, key1) == key_block(c, key) - 1)
+ appending = 1;
+ }
+
+ if (appending) {
+ keep = c->fanout;
+ move = 0;
+ } else {
+ keep = (c->fanout + 1) / 2;
+ move = c->fanout - keep;
+ }
+
+ /*
+ * Although we don't at present, we could look at the neighbors and see
+ * if we can move some zbranches there.
+ */
+
+ if (n < keep) {
+ /* Insert into existing znode */
+ zi = znode;
+ move += 1;
+ keep -= 1;
+ } else {
+ /* Insert into new znode */
+ zi = zn;
+ n -= keep;
+ /* Re-parent */
+ if (zn->level != 0)
+ zbr->znode->parent = zn;
+ }
+
+ set_bit(DIRTY_ZNODE, &zn->flags);
+ atomic_long_inc(&c->dirty_zn_cnt);
+
+ zn->child_cnt = move;
+ znode->child_cnt = keep;
+
+ dbg_tnc("moving %d, keeping %d", move, keep);
+
+ /* Move zbranch */
+ for (i = 0; i < move; i++) {
+ zn->zbranch[i] = znode->zbranch[keep + i];
+ /* Re-parent */
+ if (zn->level != 0)
+ if (zn->zbranch[i].znode) {
+ zn->zbranch[i].znode->parent = zn;
+ zn->zbranch[i].znode->iip = i;
+ }
+ }
+
+ /* Insert new key and branch */
+ dbg_tnc_key(c, key, "inserting at %d level %d, key ", n,
+ zn->level);
+
+ insert_zbranch(zi, zbr, n);
+
+ /* Insert new znode (produced by spitting) into the parent */
+ if (zp) {
+ i = n;
+ /* Locate insertion point */
+ n = znode->iip + 1;
+ if (appending && n != c->fanout)
+ appending = 0;
+
+ if (i == 0 && zi == znode && znode->iip == 0)
+ correct_parent_keys(c, znode);
+
+ /* Tail recursion */
+ zbr->key = zn->zbranch[0].key;
+ zbr->znode = zn;
+ zbr->lnum = 0;
+ zbr->offs = 0;
+ zbr->len = 0;
+ znode = zp;
+
+ goto again;
+ }
+
+ /* We have to split root znode */
+ dbg_tnc("creating new zroot at level %d", znode->level + 1);
+
+ zi = kzalloc(c->max_znode_sz, GFP_NOFS);
+ if (!zi)
+ return -ENOMEM;
+
+ zi->child_cnt = 2;
+ zi->level = znode->level + 1;
+
+ set_bit(DIRTY_ZNODE, &zi->flags);
+ atomic_long_inc(&c->dirty_zn_cnt);
+
+ zi->zbranch[0].key = znode->zbranch[0].key;
+ zi->zbranch[0].znode = znode;
+ zi->zbranch[0].lnum = c->zroot.lnum;
+ zi->zbranch[0].offs = c->zroot.offs;
+ zi->zbranch[0].len = c->zroot.len;
+ zi->zbranch[1].key = zn->zbranch[0].key;
+ zi->zbranch[1].znode = zn;
+
+ c->zroot.lnum = 0;
+ c->zroot.offs = 0;
+ c->zroot.len = 0;
+ c->zroot.znode = zi;
+
+ zn->parent = zi;
+ zn->iip = 1;
+ znode->parent = zi;
+ znode->iip = 0;
+
+ return 0;
+}
+
+/**
+ * ubifs_tnc_add - add a node to TNC.
+ * @c: UBIFS file-system description object
+ * @key: key to add
+ * @lnum: LEB number of node
+ * @offs: node offset
+ * @len: node length
+ *
+ * This function adds a node with key @key to TNC. The node may be new or it may
+ * obsolete some existing one. Returns %0 on success or negative error code on
+ * failure.
+ */
+int ubifs_tnc_add(struct ubifs_info *c, const union ubifs_key *key, int lnum,
+ int offs, int len)
+{
+ int found, n, err = 0;
+ struct ubifs_znode *znode;
+
+ mutex_lock(&c->tnc_mutex);
+ found = lookup_level0_dirty(c, key, &znode, &n);
+ if (!found) {
+ struct ubifs_zbranch zbr;
+
+ zbr.znode = NULL;
+ zbr.lnum = lnum;
+ zbr.offs = offs;
+ zbr.len = len;
+ zbr.key = *key;
+ err = tnc_insert(c, znode, &zbr, n + 1);
+ } else if (found == 1) {
+ struct ubifs_zbranch *zbr = &znode->zbranch[n];
+
+ lnc_free(zbr);
+ err = ubifs_add_dirt(c, zbr->lnum, zbr->len);
+ zbr->lnum = lnum;
+ zbr->offs = offs;
+ zbr->len = len;
+ } else
+ err = found;
+ if (!err)
+ err = dbg_check_tnc(c, 0);
+ mutex_unlock(&c->tnc_mutex);
+
+ return err;
+}
+
+/**
+ * dirty_cow_bottom_up - dirty a znode and its ancestors.
+ * @c: UBIFS file-system description object
+ * @znode: znode to dirty
+ *
+ * If we do not have a unique key that resides in a znode, then we cannot
+ * dirty that znode from the top down (i.e. by using lookup_level0_dirty)
+ * This function records the path back to the last dirty ancestor, and then
+ * dirties the znodes on that path.
+ */
+static struct ubifs_znode *dirty_cow_bottom_up(struct ubifs_info *c,
+ struct ubifs_znode *znode)
+{
+ struct ubifs_znode *zp;
+ int *path = NULL, h, p = 0;
+
+ ubifs_assert(c->zroot.znode != NULL);
+ ubifs_assert(znode != NULL);
+ h = c->zroot.znode->level;
+ if (h) {
+ path = kmalloc(sizeof(int) * h, GFP_NOFS);
+ if (!path)
+ return ERR_PTR(-ENOMEM);
+ /* Go up until parent is dirty */
+ while (1) {
+ int n;
+
+ zp = znode->parent;
+ if (!zp)
+ break;
+ n = znode->iip;
+ ubifs_assert(p < h);
+ path[p++] = n;
+ if (!zp->cnext && ubifs_zn_dirty(znode))
+ break;
+ znode = zp;
+ }
+ }
+ /* Come back down, dirtying as we go */
+ while (1) {
+ struct ubifs_zbranch *zbr;
+
+ zp = znode->parent;
+ if (zp) {
+ ubifs_assert(path[p - 1] >= 0);
+ ubifs_assert(path[p - 1] < zp->child_cnt);
+ zbr = &zp->zbranch[path[--p]];
+ znode = dirty_cow_znode(c, zbr);
+ } else {
+ ubifs_assert(znode == c->zroot.znode);
+ znode = dirty_cow_znode(c, &c->zroot);
+ }
+ if (IS_ERR(znode) || !p)
+ break;
+ ubifs_assert(path[p - 1] >= 0);
+ ubifs_assert(path[p - 1] < znode->child_cnt);
+ znode = znode->zbranch[path[p - 1]].znode;
+ }
+ kfree(path);
+ return znode;
+}
+
+/**
+ * ubifs_tnc_replace - replace a node in the TNC only if the old node is found.
+ * @c: UBIFS file-system description object
+ * @key: key to add
+ * @old_lnum: LEB number of old node
+ * @old_offs: old node offset
+ * @lnum: LEB number of node
+ * @offs: node offset
+ * @len: node length
+ *
+ * This function replaces a node with key @key in the TNC only if the old node
+ * is found. This function is called by garbage collection when node are moved.
+ * Returns %0 on success or negative error code on failure.
+ */
+int ubifs_tnc_replace(struct ubifs_info *c, const union ubifs_key *key,
+ int old_lnum, int old_offs, int lnum, int offs, int len)
+{
+ int found, n, err = 0;
+ struct ubifs_znode *znode;
+
+ mutex_lock(&c->tnc_mutex);
+ found = lookup_level0_dirty(c, key, &znode, &n);
+ if (found < 0) {
+ err = found;
+ goto out;
+ } else if (found == 1) {
+ struct ubifs_zbranch *zbr = &znode->zbranch[n];
+
+ found = 0;
+ if (zbr->lnum == old_lnum && zbr->offs == old_offs) {
+ lnc_free(zbr);
+ err = ubifs_add_dirt(c, zbr->lnum, zbr->len);
+ if (err)
+ goto out;
+ zbr->lnum = lnum;
+ zbr->offs = offs;
+ zbr->len = len;
+ found = 1;
+ } else if (is_hash_key(c, key)) {
+ found = resolve_collision_directly(c, key, &znode, &n,
+ old_lnum, old_offs);
+ if (found == -ENOENT)
+ found = 0;
+ if (found < 0) {
+ err = found;
+ goto out;
+ } else if (found) {
+ /* Ensure the znode is dirtied */
+ if (znode->cnext || !ubifs_zn_dirty(znode)) {
+ znode = dirty_cow_bottom_up(c,
+ znode);
+ if (IS_ERR(znode)) {
+ err = PTR_ERR(znode);
+ goto out;
+ }
+ }
+ zbr = &znode->zbranch[n];
+ lnc_free(zbr);
+ err = ubifs_add_dirt(c, zbr->lnum,
+ zbr->len);
+ if (err)
+ goto out;
+ zbr->lnum = lnum;
+ zbr->offs = offs;
+ zbr->len = len;
+ }
+ }
+ }
+
+ if (found == 0) {
+ err = ubifs_add_dirt(c, lnum, len);
+ if (err)
+ goto out;
+ }
+
+ err = dbg_check_tnc(c, 0);
+
+out:
+ mutex_unlock(&c->tnc_mutex);
+ return err;
+}
+
+/**
+ * ubifs_tnc_add_nm - add a "hashed" node to TNC.
+ * @c: UBIFS file-system description object
+ * @key: key to add
+ * @lnum: LEB number of node
+ * @offs: node offset
+ * @len: node length
+ * @nm: node name
+ *
+ * This is the same as 'ubifs_tnc_add()' but it should be used with keys which
+ * may have collisions, like directory entry keys.
+ */
+int ubifs_tnc_add_nm(struct ubifs_info *c, const union ubifs_key *key,
+ int lnum, int offs, int len, const struct qstr *nm)
+{
+ int found, n, err = 0;
+ struct ubifs_znode *znode;
+
+ mutex_lock(&c->tnc_mutex);
+ found = lookup_level0_dirty(c, key, &znode, &n);
+ if (found < 0) {
+ err = found;
+ goto out;
+ }
+ if (found == 1) {
+ if (c->replaying)
+ found = fallible_resolve_collision(c, key, &znode, &n,
+ nm);
+ else
+ found = resolve_collision(c, key, &znode, &n, nm);
+ if (found < 0 && found != -ENOENT) {
+ err = found;
+ goto out;
+ }
+ /* Ensure the znode is dirtied */
+ if (znode->cnext || !ubifs_zn_dirty(znode)) {
+ znode = dirty_cow_bottom_up(c, znode);
+ if (IS_ERR(znode)) {
+ err = PTR_ERR(znode);
+ goto out;
+ }
+ }
+ if (found == 0)
+ n -= 1;
+ else if (found == -ENOENT)
+ found = 0;
+ else if (found == 1) {
+ struct ubifs_zbranch *zbr = &znode->zbranch[n];
+
+ lnc_free(zbr);
+ err = ubifs_add_dirt(c, zbr->lnum, zbr->len);
+ zbr->lnum = lnum;
+ zbr->offs = offs;
+ zbr->len = len;
+ goto out;
+ }
+ }
+ if (!found) {
+ struct ubifs_zbranch zbr;
+
+ zbr.znode = NULL;
+ zbr.lnum = lnum;
+ zbr.offs = offs;
+ zbr.len = len;
+ zbr.key = *key;
+ err = tnc_insert(c, znode, &zbr, n + 1);
+ }
+
+out:
+ if (!err)
+ err = dbg_check_tnc(c, 0);
+ mutex_unlock(&c->tnc_mutex);
+ return err;
+}
+
+/**
+ * tnc_delete - delete a znode form TNC.
+ * @c: UBIFS file-system description object
+ * @znode: znode to delete from
+ * @n: zbranch slot number to delete
+ *
+ * This function deletes a leaf node from @n-th slot of @znode. Returns zero in
+ * case of success and a negative error code in case of failure.
+ */
+static int tnc_delete(struct ubifs_info *c, struct ubifs_znode *znode, int n)
+{
+ struct ubifs_zbranch *zbr;
+ struct ubifs_znode *zp;
+ int i, err;
+
+ /* Delete without merge for now */
+ ubifs_assert(znode->level == 0);
+ ubifs_assert(n >= 0 && n < c->fanout);
+ dbg_tnc_key(c, &znode->zbranch[n].key, "deleting");
+
+ zbr = &znode->zbranch[n];
+ lnc_free(zbr);
+
+ err = ubifs_add_dirt(c, zbr->lnum, zbr->len);
+ if (err) {
+ dbg_dump_znode(c, znode);
+ return err;
+ }
+
+ /* We do not "gap" zbranch slots */
+ for (i = n; i < znode->child_cnt - 1; i++)
+ znode->zbranch[i] = znode->zbranch[i + 1];
+ znode->child_cnt -= 1;
+
+ if (znode->child_cnt > 0)
+ return 0;
+
+ /*
+ * This was the last zbranch, we have to delete this znode from the
+ * parent.
+ */
+
+ do {
+ ubifs_assert(!test_bit(OBSOLETE_ZNODE, &znode->flags));
+ ubifs_assert(ubifs_zn_dirty(znode));
+
+ zp = znode->parent;
+ n = znode->iip;
+
+ atomic_long_dec(&c->dirty_zn_cnt);
+
+ err = insert_old_idx_znode(c, znode);
+ if (err)
+ return err;
+
+ if (znode->cnext) {
+ set_bit(OBSOLETE_ZNODE, &znode->flags);
+ atomic_long_inc(&c->clean_zn_cnt);
+ atomic_long_inc(&ubifs_clean_zn_cnt);
+ } else
+ kfree(znode);
+ znode = zp;
+ } while (znode->child_cnt == 1); /* while removing last child */
+
+ /* Remove from znode, entry n - 1 */
+ znode->child_cnt -= 1;
+ ubifs_assert(znode->level != 0);
+ for (i = n; i < znode->child_cnt; i++) {
+ znode->zbranch[i] = znode->zbranch[i + 1];
+ if (znode->zbranch[i].znode)
+ znode->zbranch[i].znode->iip = i;
+ }
+
+ /*
+ * If this is the root and it has only 1 child then
+ * collapse the tree.
+ */
+ if (znode->parent == NULL) {
+ while (znode->child_cnt == 1 && znode->level != 0) {
+ zp = znode;
+ zbr = &znode->zbranch[0];
+ znode = get_znode(c, znode, 0);
+ if (IS_ERR(znode))
+ return PTR_ERR(znode);
+ znode = dirty_cow_znode(c, zbr);
+ if (IS_ERR(znode))
+ return PTR_ERR(znode);
+ znode->parent = NULL;
+ znode->iip = 0;
+ if (c->zroot.len) {
+ err = insert_old_idx(c, c->zroot.lnum,
+ c->zroot.offs);
+ if (err)
+ return err;
+ }
+ c->zroot.lnum = zbr->lnum;
+ c->zroot.offs = zbr->offs;
+ c->zroot.len = zbr->len;
+ c->zroot.znode = znode;
+ ubifs_assert(!test_bit(OBSOLETE_ZNODE,
+ &zp->flags));
+ ubifs_assert(test_bit(DIRTY_ZNODE, &zp->flags));
+ atomic_long_dec(&c->dirty_zn_cnt);
+
+ if (zp->cnext) {
+ set_bit(OBSOLETE_ZNODE, &zp->flags);
+ atomic_long_inc(&c->clean_zn_cnt);
+ atomic_long_inc(&ubifs_clean_zn_cnt);
+ } else
+ kfree(zp);
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * ubifs_tnc_remove - remove an index entry of a node.
+ * @c: UBIFS file-system description object
+ * @key: key of node
+ *
+ * Returns %0 on success or negative error code on failure.
+ */
+int ubifs_tnc_remove(struct ubifs_info *c, const union ubifs_key *key)
+{
+ int found, n, err = 0;
+ struct ubifs_znode *znode;
+
+ mutex_lock(&c->tnc_mutex);
+ found = lookup_level0_dirty(c, key, &znode, &n);
+ if (found == 1)
+ err = tnc_delete(c, znode, n);
+ else if (found < 0)
+ err = found;
+ if (!err)
+ err = dbg_check_tnc(c, 0);
+ mutex_unlock(&c->tnc_mutex);
+ return err;
+}
+
+/**
+ * ubifs_tnc_remove_nm - remove an index entry for a "hashed" node.
+ * @c: UBIFS file-system description object
+ * @key: key of node
+ * @nm: directory entry name
+ *
+ * Returns %0 on success or negative error code on failure.
+ */
+int ubifs_tnc_remove_nm(struct ubifs_info *c, const union ubifs_key *key,
+ const struct qstr *nm)
+{
+ int found, n, err = 0;
+ struct ubifs_znode *znode;
+
+ mutex_lock(&c->tnc_mutex);
+ found = lookup_level0_dirty(c, key, &znode, &n);
+ if (found < 0) {
+ err = found;
+ goto out;
+ }
+ if (found) {
+ if (c->replaying)
+ found = fallible_resolve_collision(c, key, &znode, &n,
+ nm);
+ else
+ found = resolve_collision(c, key, &znode, &n, nm);
+ if (found == -ENOENT)
+ found = 0;
+ if (found < 0) {
+ err = found;
+ goto out;
+ }
+ if (found) {
+ /* Ensure the znode is dirtied */
+ if (znode->cnext || !ubifs_zn_dirty(znode)) {
+ znode = dirty_cow_bottom_up(c, znode);
+ if (IS_ERR(znode)) {
+ err = PTR_ERR(znode);
+ goto out;
+ }
+ }
+ err = tnc_delete(c, znode, n);
+ }
+ }
+out:
+ if (!err)
+ err = dbg_check_tnc(c, 0);
+ mutex_unlock(&c->tnc_mutex);
+ return err;
+}
+
+/**
+ * key_in_range - determine if a key falls within a range of keys.
+ * @c: UBIFS file-system description object
+ * @key: key to check
+ * @from_key: lowest key in range
+ * @to_key: highest key in range
+ *
+ * This function returns %1 if the key is in range and %0 otherwise.
+ */
+static int key_in_range(struct ubifs_info *c, union ubifs_key *key,
+ union ubifs_key *from_key, union ubifs_key *to_key)
+{
+ if (keys_cmp(c, key, from_key) < 0)
+ return 0;
+ if (keys_cmp(c, key, to_key) > 0)
+ return 0;
+ return 1;
+}
+
+/**
+ * ubifs_tnc_remove_range - remove index entries in range.
+ * @c: UBIFS file-system description object
+ * @from_key: lowest key to remove
+ * @to_key: highest key to remove
+ *
+ * This function removes index entries starting at @from_key and ending at
+ * @to_key. This function returns zero in case of success and a negative error
+ * code in case of failure.
+ */
+int ubifs_tnc_remove_range(struct ubifs_info *c, union ubifs_key *from_key,
+ union ubifs_key *to_key)
+{
+ int found, i, n, k, err = 0;
+ struct ubifs_znode *znode;
+ union ubifs_key *key;
+
+ mutex_lock(&c->tnc_mutex);
+ while (1) {
+ /* Find first level 0 znode that contains keys to remove */
+ found = lookup_level0(c, from_key, &znode, &n);
+ if (found < 0) {
+ err = found;
+ goto out;
+ }
+ if (found)
+ key = from_key;
+ else {
+ err = tnc_next(c, &znode, &n);
+ if (err == -ENOENT) {
+ err = 0;
+ goto out;
+ }
+ if (err < 0)
+ goto out;
+ key = &znode->zbranch[n].key;
+ if (!key_in_range(c, key, from_key, to_key)) {
+ err = 0;
+ goto out;
+ }
+ }
+ /* Ensure the znode is dirtied */
+ if (znode->cnext || !ubifs_zn_dirty(znode)) {
+ znode = dirty_cow_bottom_up(c, znode);
+ if (IS_ERR(znode)) {
+ err = PTR_ERR(znode);
+ goto out;
+ }
+ }
+ /* Remove all keys in range except the first */
+ for (i = n + 1, k = 0; i < znode->child_cnt; i++, k++) {
+ key = &znode->zbranch[i].key;
+ if (!key_in_range(c, key, from_key, to_key))
+ break;
+ lnc_free(&znode->zbranch[i]);
+ err = ubifs_add_dirt(c, znode->zbranch[i].lnum,
+ znode->zbranch[i].len);
+ if (err) {
+ dbg_dump_znode(c, znode);
+ goto out;
+ }
+ dbg_tnc_key(c, key, "removing");
+ }
+ if (k) {
+ for (i = n + 1 + k; i < znode->child_cnt; i++)
+ znode->zbranch[i - k] = znode->zbranch[i];
+ znode->child_cnt -= k;
+ }
+ /* Now delete the first */
+ err = tnc_delete(c, znode, n);
+ if (err)
+ goto out;
+ }
+out:
+ if (!err)
+ err = dbg_check_tnc(c, 0);
+ mutex_unlock(&c->tnc_mutex);
+ return err;
+}
+
+/**
+ * ubifs_tnc_remove_ino - remove an inode from TNC.
+ * @c: UBIFS file-system description object
+ * @inum: inode number to remove
+ *
+ * This function remove inode @inum and all the extended attributes associated
+ * with the anode from TNC and returns zero in case of success or a negative
+ * error code in case of failure.
+ */
+int ubifs_tnc_remove_ino(struct ubifs_info *c, ino_t inum)
+{
+ union ubifs_key key1, key2;
+ struct ubifs_dent_node *xent, *pxent = NULL;
+ struct qstr nm = { .name = NULL };
+
+ dbg_tnc("ino %lu", inum);
+
+ /*
+ * Walk all extended attribute entries and remove them together with
+ * corresponding extended attribute inodes.
+ */
+ lowest_xent_key(c, &key1, inum);
+ while (1) {
+ ino_t xattr_inum;
+ int err;
+
+ xent = ubifs_tnc_next_ent(c, &key1, &nm);
+ if (IS_ERR(xent)) {
+ err = PTR_ERR(xent);
+ if (err == -ENOENT)
+ break;
+ return err;
+ }
+
+ xattr_inum = le64_to_cpu(xent->inum);
+ dbg_tnc("xent '%s', ino %lu", xent->name, xattr_inum);
+
+ nm.name = xent->name;
+ nm.len = le16_to_cpu(xent->nlen);
+ err = ubifs_tnc_remove_nm(c, &key1, &nm);
+ if (err) {
+ kfree(xent);
+ return err;
+ }
+
+ lowest_ino_key(c, &key1, xattr_inum);
+ highest_ino_key(c, &key2, xattr_inum);
+ err = ubifs_tnc_remove_range(c, &key1, &key2);
+ if (err) {
+ kfree(xent);
+ return err;
+ }
+
+ kfree(pxent);
+ pxent = xent;
+ key_read(c, &xent->key, &key1);
+ }
+
+ kfree(pxent);
+ lowest_ino_key(c, &key1, inum);
+ highest_ino_key(c, &key2, inum);
+
+ return ubifs_tnc_remove_range(c, &key1, &key2);
+}
+
+/**
+ * ubifs_tnc_next_ent - walk directory or extended attribute entries.
+ * @c: UBIFS file-system description object
+ * @key: key of last entry
+ * @nm: name of last entry found or %NULL
+ *
+ * This function finds and reads the next directory or extended attribute entry
+ * after the given key (@key) if there is one. @name is used to resolve
+ * collisions. If the fist entry has to be found, @key has to contain the
+ * lowest possible key value for this inode and @name has to be %NULL.
+ *
+ * This function returns the found directory or extended attribute entry node
+ * in case of success, %-ENOENT is returned if no entry is found, or a negative
+ * error code in case of failure.
+ */
+struct ubifs_dent_node *ubifs_tnc_next_ent(struct ubifs_info *c,
+ union ubifs_key *key,
+ const struct qstr *nm)
+{
+ int found, n, err, type = key_type(c, key), dlen = 0;
+ struct ubifs_znode *znode;
+ struct ubifs_dent_node *dent = NULL;
+ struct ubifs_zbranch *zbr;
+ union ubifs_key *dkey;
+
+ dbg_tnc_key(c, key, "%s",
+ ((nm && nm->name) ? (char *)nm->name : "(lowest)"));
+ ubifs_assert(type == UBIFS_DENT_KEY || type == UBIFS_XENT_KEY);
+
+ mutex_lock(&c->tnc_mutex);
+ found = lookup_level0(c, key, &znode, &n);
+ if (found < 0) {
+ err = found;
+ goto out;
+ }
+
+ /* Handle collisions */
+ if (found && nm && nm->name) {
+ err = resolve_collision(c, key, &znode, &n, nm);
+ if (err < 0)
+ goto out;
+ if (err == 0)
+ goto name_not_found;
+ }
+
+again:
+ /* Now find next entry */
+ err = tnc_next(c, &znode, &n);
+ if (err)
+ goto out;
+
+name_not_found:
+ dkey = &znode->zbranch[n].key;
+ zbr = &znode->zbranch[n];
+
+ if (key_ino(c, dkey) != key_ino(c, key) ||
+ key_type(c, dkey) != type) {
+ err = -ENOENT;
+ goto out;
+ }
+
+ if (!dent || dlen < zbr->len) {
+ kfree(dent);
+ dlen = zbr->len;
+ dent = kmalloc(dlen, GFP_NOFS);
+ if (!dent) {
+ err = -ENOMEM;
+ goto out;
+ }
+ }
+
+ err = tnc_read_node(c, zbr, dent);
+ if (err)
+ goto out;
+
+ if (dent->inum == 0)
+ goto again;
+
+ mutex_unlock(&c->tnc_mutex);
+ return dent;
+
+out:
+ kfree(dent);
+ mutex_unlock(&c->tnc_mutex);
+ return ERR_PTR(err);
+}
+
+/**
+ * tnc_postorder_first - find first znode to do postorder tree traversal.
+ * @znode: znode to start at (root of the sub-tree to traverse)
+ *
+ * Find the lowest leftmost znode in a subtree of the TNC tree. The LNC is
+ * ignored.
+ */
+static struct ubifs_znode *tnc_postorder_first(struct ubifs_znode *znode)
+{
+ if (unlikely(!znode))
+ return NULL;
+
+ while (znode->level > 0) {
+ struct ubifs_znode *child;
+
+ child = ubifs_tnc_find_child(znode, 0);
+ if (!child)
+ return znode;
+ znode = child;
+ }
+
+ return znode;
+}
+
+/**
+ * tnc_postorder_next - next TNC tree element in postorder traversal.
+ * @znode: previous znode
+ *
+ * This function implements postorder TNC traversal. The LNC is ignored.
+ * Returns the next element or %NULL if @znode is already the last one.
+ */
+static struct ubifs_znode *tnc_postorder_next(struct ubifs_znode *znode)
+{
+ struct ubifs_znode *zn;
+
+ ubifs_assert(znode);
+ if (unlikely(!znode->parent))
+ return NULL;
+
+ /* Switch to the next index in the parent */
+ zn = ubifs_tnc_find_child(znode->parent, znode->iip + 1);
+ if (!zn)
+ /* This is in fact the last child, return parent */
+ return znode->parent;
+
+ /* Go to the first znode in this new subtree */
+ return tnc_postorder_first(zn);
+}
+
+/**
+ * ubifs_destroy_tnc_subtree - destroy all znodes connected to a subtree.
+ * @znode: znode defining subtree to destroy
+ *
+ * This function destroys subtree of the TNC tree. Returns number of clean
+ * znodes in the subtree.
+ */
+long ubifs_destroy_tnc_subtree(struct ubifs_znode *znode)
+{
+ struct ubifs_znode *zn = tnc_postorder_first(znode);
+ long clean_freed = 0;
+ int n;
+
+ ubifs_assert(zn);
+ while (1) {
+ for (n = 0; n < zn->child_cnt; n++) {
+ if (!zn->zbranch[n].znode)
+ continue;
+
+ if (zn->level > 0 &&
+ !ubifs_zn_dirty(zn->zbranch[n].znode))
+ clean_freed += 1;
+
+ cond_resched();
+ kfree(zn->zbranch[n].znode);
+ }
+
+ if (zn == znode) {
+ if (!ubifs_zn_dirty(zn))
+ clean_freed += 1;
+ kfree(zn);
+ return clean_freed;
+ }
+
+ zn = tnc_postorder_next(zn);
+ }
+}
+
+/**
+ * tnc_destroy_cnext - destroy left-over obsolete znodes from a failed commit.
+ * @c: UBIFS file-system description object
+ *
+ * Destroy left-over obsolete znodes from a failed commit.
+ */
+static void tnc_destroy_cnext(struct ubifs_info *c)
+{
+ struct ubifs_znode *cnext;
+
+ if (!c->cnext)
+ return;
+ ubifs_assert(c->cmt_state == COMMIT_BROKEN);
+ cnext = c->cnext;
+ do {
+ struct ubifs_znode *znode = cnext;
+
+ cnext = cnext->cnext;
+ if (test_bit(OBSOLETE_ZNODE, &znode->flags))
+ kfree(znode);
+ } while (cnext != NULL && cnext != c->cnext);
+}
+
+/**
+ * ubifs_tnc_close - close TNC subsystem and free all related resources.
+ * @c: UBIFS file-system description object
+ */
+void ubifs_tnc_close(struct ubifs_info *c)
+{
+ long clean_freed;
+
+ tnc_destroy_cnext(c);
+ if (c->zroot.znode) {
+ clean_freed = ubifs_destroy_tnc_subtree(c->zroot.znode);
+ atomic_long_sub(clean_freed, &ubifs_clean_zn_cnt);
+ }
+ kfree(c->cbuf);
+ kfree(c->gap_lebs);
+ kfree(c->ilebs);
+ destroy_old_idx(c);
+}
+
+/**
+ * left_znode - get the znode to the left.
+ * @c: UBIFS file-system description object
+ * @znode: znode
+ *
+ * This function returns a pointer to the znode to the left of @znode or NULL if
+ * there is not one. A negative error code is returned on failure.
+ */
+static struct ubifs_znode *left_znode(struct ubifs_info *c,
+ struct ubifs_znode *znode)
+{
+ int level = znode->level;
+
+ while (1) {
+ int n = znode->iip - 1;
+
+ /* Go up until we can go left */
+ znode = znode->parent;
+ if (!znode)
+ return NULL;
+ if (n >= 0) {
+ /* Now go down the rightmost branch to 'level' */
+ znode = get_znode(c, znode, n);
+ if (IS_ERR(znode))
+ return znode;
+ while (znode->level != level) {
+ n = znode->child_cnt - 1;
+ znode = get_znode(c, znode, n);
+ if (IS_ERR(znode))
+ return znode;
+ }
+ break;
+ }
+ }
+ return znode;
+}
+
+/**
+ * right_znode - get the znode to the right.
+ * @c: UBIFS file-system description object
+ * @znode: znode
+ *
+ * This function returns a pointer to the znode to the right of @znode or NULL
+ * if there is not one. A negative error code is returned on failure.
+ */
+static struct ubifs_znode *right_znode(struct ubifs_info *c,
+ struct ubifs_znode *znode)
+{
+ int level = znode->level;
+
+ while (1) {
+ int n = znode->iip + 1;
+
+ /* Go up until we can go right */
+ znode = znode->parent;
+ if (!znode)
+ return NULL;
+ if (n < znode->child_cnt) {
+ /* Now go down the leftmost branch to 'level' */
+ znode = get_znode(c, znode, n);
+ if (IS_ERR(znode))
+ return znode;
+ while (znode->level != level) {
+ znode = get_znode(c, znode, 0);
+ if (IS_ERR(znode))
+ return znode;
+ }
+ break;
+ }
+ }
+ return znode;
+}
+
+/**
+ * lookup_znode - find a particular znode.
+ * @c: UBIFS file-system description object
+ * @key: index node key
+ * @level: index node level
+ * @lnum: index node LEB number
+ * @offs: index node offset
+ *
+ * This function returns a pointer to the znode found or NULL if it is not
+ * found. A negative error code is returned on failure.
+ */
+static struct ubifs_znode *lookup_znode(struct ubifs_info *c,
+ union ubifs_key *key, int level,
+ int lnum, int offs)
+{
+ struct ubifs_znode *znode, *zn;
+ int n, nn;
+
+ /*
+ * The arguments have probably been read off flash, so don't assume
+ * they are valid.
+ */
+ if (level < 0)
+ return ERR_PTR(-EINVAL);
+
+ /* Get the root znode */
+ znode = c->zroot.znode;
+ if (!znode) {
+ znode = load_znode(c, &c->zroot, NULL, 0);
+ if (IS_ERR(znode))
+ return znode;
+ }
+ /* Check if it is the one we are looking for */
+ if (c->zroot.lnum == lnum && c->zroot.offs == offs)
+ return znode;
+ /* Descend to the parent level i.e. (level + 1) */
+ if (level >= znode->level)
+ return NULL;
+ while (1) {
+ search_zbranch(c, znode, key, &n);
+ if (n < 0)
+ return NULL;
+ if (znode->level == level + 1)
+ break;
+ znode = get_znode(c, znode, n);
+ if (IS_ERR(znode))
+ return znode;
+ }
+ /* Check if the child is the one we are looking for */
+ if (znode->zbranch[n].lnum == lnum && znode->zbranch[n].offs == offs)
+ return get_znode(c, znode, n);
+ /* If the key is unique, there is nowhere else to look */
+ if (!is_hash_key(c, key))
+ return NULL;
+ /*
+ * The key is not unique and so may be also in the znodes to either
+ * side.
+ */
+ zn = znode;
+ nn = n;
+ /* Look left */
+ while (1) {
+ /* Move one branch to the left */
+ if (n)
+ n -= 1;
+ else {
+ znode = left_znode(c, znode);
+ if (znode == NULL)
+ break;
+ if (IS_ERR(znode))
+ return znode;
+ n = znode->child_cnt - 1;
+ }
+ /* Check it */
+ if (znode->zbranch[n].lnum == lnum &&
+ znode->zbranch[n].offs == offs)
+ return get_znode(c, znode, n);
+ /* Stop if the key is less than the one we are looking for */
+ if (keys_cmp(c, &znode->zbranch[n].key, key) < 0)
+ break;
+ }
+ /* Back to the middle */
+ znode = zn;
+ n = nn;
+ /* Look right */
+ while (1) {
+ /* Move one branch to the right */
+ if (++n >= znode->child_cnt) {
+ znode = right_znode(c, znode);
+ if (znode == NULL)
+ break;
+ if (IS_ERR(znode))
+ return znode;
+ n = 0;
+ }
+ /* Check it */
+ if (znode->zbranch[n].lnum == lnum &&
+ znode->zbranch[n].offs == offs)
+ return get_znode(c, znode, n);
+ /* Stop if the key is greater than the one we are looking for */
+ if (keys_cmp(c, &znode->zbranch[n].key, key) > 0)
+ break;
+ }
+ return NULL;
+}
+
+/**
+ * is_idx_node_in_tnc - determine if an index node is in the TNC.
+ * @c: UBIFS file-system description object
+ * @key: key of index node
+ * @level: index node level
+ * @lnum: LEB number of index node
+ * @offs: offset of index node
+ *
+ * This function returns %0 if the index node is not referred to in the TNC.
+ * This function returns %1 if the index node is referred to in the TNC and the
+ * corresponding znode is dirty.
+ * This function returns %2 if an index node is referred to in the TNC and the
+ * corresponding znode is clean.
+ * Otherwise, this function returns a negative error code.
+ *
+ * For index nodes, the key is the key of the first child.
+ *
+ * This function relies on the fact that 0:0 is never a valid LEB number and
+ * offset for a main-area node.
+ */
+int is_idx_node_in_tnc(struct ubifs_info *c, union ubifs_key *key, int level,
+ int lnum, int offs)
+{
+ struct ubifs_znode *znode;
+
+ znode = lookup_znode(c, key, level, lnum, offs);
+ if (znode == NULL)
+ return 0;
+ if (IS_ERR(znode))
+ return PTR_ERR(znode);
+ if (ubifs_zn_dirty(znode))
+ return 1;
+ else
+ return 2;
+}
+
+/**
+ * is_node_clean - determine if a node is clean.
+ * @c: UBIFS file-system description object
+ * @key: node key
+ * @lnum: node LEB number
+ * @offs: node offset
+ *
+ * This function returns %1 if a node is referred to in the TNC and %0
+ * if it is not. Otherwise a negative error code is returned.
+ *
+ * This function relies on the fact that 0:0 is never a valid LEB number and
+ * offset for a main-area node.
+ */
+static int is_node_clean(struct ubifs_info *c, union ubifs_key *key,
+ int lnum, int offs)
+{
+ struct ubifs_zbranch *zbr;
+ struct ubifs_znode *znode, *zn;
+ int n, found, err, nn;
+ const int unique = !is_hash_key(c, key);
+
+ found = lookup_level0(c, key, &znode, &n);
+ if (found < 0)
+ return found; /* Error code */
+ if (!found)
+ return 0;
+ zbr = &znode->zbranch[n];
+ if (lnum == zbr->lnum && offs == zbr->offs)
+ return 1; /* Found it */
+ if (unique)
+ return 0;
+ /*
+ * Because the key is not unique, we have to look left
+ * and right as well
+ */
+ zn = znode;
+ nn = n;
+ /* Look left */
+ while (1) {
+ err = tnc_prev(c, &znode, &n);
+ if (err == -ENOENT)
+ break;
+ if (err)
+ return err;
+ if (keys_cmp(c, key, &znode->zbranch[n].key))
+ break;
+ zbr = &znode->zbranch[n];
+ if (lnum == zbr->lnum && offs == zbr->offs)
+ return 1; /* Found it */
+ }
+ /* Look right */
+ znode = zn;
+ n = nn;
+ while (1) {
+ err = tnc_next(c, &znode, &n);
+ if (err) {
+ if (err == -ENOENT)
+ return 0;
+ return err;
+ }
+ if (keys_cmp(c, key, &znode->zbranch[n].key))
+ break;
+ zbr = &znode->zbranch[n];
+ if (lnum == zbr->lnum && offs == zbr->offs)
+ return 1; /* Found it */
+ }
+ return 0;
+}
+
+/**
+ * ubifs_tnc_has_node - determine whether a node is in the TNC.
+ * @c: UBIFS file-system description object
+ * @key: node key
+ * @level: index node level (if it is an index node)
+ * @lnum: node LEB number
+ * @offs: node offset
+ * @is_idx: non-zero if the node is an index node
+ *
+ * This function returns %1 if a node is in the TNC and %0 if it is not.
+ * Otherwise a negative error code is returned.
+ * For index nodes, the key is the key of the first child.
+ * An index node is considered to be in the TNC only if the corresponding znode
+ * is clean or has not been loaded.
+ */
+int ubifs_tnc_has_node(struct ubifs_info *c, union ubifs_key *key, int level,
+ int lnum, int offs, int is_idx)
+{
+ int ret;
+
+ mutex_lock(&c->tnc_mutex);
+ if (is_idx) {
+ ret = is_idx_node_in_tnc(c, key, level, lnum, offs);
+ if (ret < 0)
+ goto out; /* Error code */
+ if (ret == 1)
+ /* The index node was found but it was dirty */
+ ret = 0;
+ else if (ret == 2)
+ /* The index node was found and it was clean */
+ ret = 1;
+ else if (ret != 0)
+ BUG();
+ } else
+ ret = is_node_clean(c, key, lnum, offs);
+out:
+ mutex_unlock(&c->tnc_mutex);
+ return ret;
+}
+
+/**
+ * ubifs_dirty_idx_node - dirty an index node.
+ * @c: UBIFS file-system description object
+ * @key: index node key
+ * @level: index node level
+ * @lnum: index node LEB number
+ * @offs: index node offset
+ *
+ * This function loads and dirties an index node so that it can be garbage
+ * collected.
+ *
+ * For index nodes, the key is the key of the first child.
+ *
+ * This function relies on the fact that 0:0 is never a valid LEB number and
+ * offset for a main-area node.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_dirty_idx_node(struct ubifs_info *c, union ubifs_key *key, int level,
+ int lnum, int offs)
+{
+ struct ubifs_znode *znode;
+ int err = 0;
+
+ mutex_lock(&c->tnc_mutex);
+ znode = lookup_znode(c, key, level, lnum, offs);
+ if (!znode)
+ goto out;
+ if (IS_ERR(znode)) {
+ err = PTR_ERR(znode);
+ goto out;
+ }
+ znode = dirty_cow_bottom_up(c, znode);
+ if (IS_ERR(znode)) {
+ err = PTR_ERR(znode);
+ goto out;
+ }
+out:
+ mutex_unlock(&c->tnc_mutex);
+ return err;
+}
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_TNC
+
+/**
+ * dbg_check_znode - check if znode is all right.
+ * @c: UBIFS file-system description object
+ * @zbr: zbranch which points to this znode
+ *
+ * This function makes sure that znode referred to by @zbr is all right.
+ * Returns zero if it is, and %-EINVAL if it is not.
+ */
+static int dbg_check_znode(const struct ubifs_info *c,
+ const struct ubifs_zbranch *zbr)
+{
+ const struct ubifs_znode *znode = zbr->znode;
+ const struct ubifs_znode *zp = znode->parent;
+ int n, err, cmp;
+
+ if (znode->child_cnt <= 0 || znode->child_cnt > c->fanout) {
+ err = 1;
+ goto out;
+ }
+ if (znode->level < 0) {
+ err = 2;
+ goto out;
+ }
+ if (znode->iip < 0 || znode->iip >= c->fanout) {
+ err = 3;
+ goto out;
+ }
+
+ if (zbr->len == 0)
+ /* Only dirty zbranch may have no on-flash nodes */
+ if (!ubifs_zn_dirty(znode)) {
+ err = 4;
+ goto out;
+ }
+
+ if (ubifs_zn_dirty(znode))
+ /* If znode is dirty, its parent has to be dirty as well */
+ if (zp && !ubifs_zn_dirty(zp))
+ /*
+ * The dirty flag is atomic and is cleared outside the
+ * TNC mutex, so znode's dirty flag may now have
+ * been cleared. The child is always cleared before the
+ * parent, so we just need to check again.
+ */
+ if (ubifs_zn_dirty(znode)) {
+ err = 5;
+ goto out;
+ }
+
+ if (zp) {
+ const union ubifs_key *min, *max;
+
+ if (znode->level != zp->level - 1) {
+ err = 6;
+ goto out;
+ }
+
+ /* Make sure the 'parent' pointer in our znode is correct */
+ err = search_zbranch(c, zp, &zbr->key, &n);
+ if (!err) {
+ /* This zbranch does not exist in the parent */
+ err = 7;
+ goto out;
+ }
+
+ if (znode->iip != n) {
+ err = 8;
+ goto out;
+ }
+
+ /*
+ * Make sure that the first key in our znode is greater than or
+ * equal to the key in the pointing zbranch.
+ */
+ min = &zbr->key;
+ cmp = keys_cmp(c, min, &znode->zbranch[0].key);
+ if (cmp == 1) {
+ err = 9;
+ goto out;
+ }
+
+ if (n + 1 < zp->child_cnt) {
+ max = &zp->zbranch[n + 1].key;
+
+ /*
+ * Make sure the last key in our znode is less than the
+ * the key in zbranch which goes after our pointing
+ * zbranch.
+ */
+ cmp = keys_cmp(c, max,
+ &znode->zbranch[znode->child_cnt - 1].key);
+ if (cmp == -1) {
+ err = 10;
+ goto out;
+ }
+ }
+ } else {
+ /* This may only be root znode */
+ if (zbr != &c->zroot) {
+ err = 11;
+ goto out;
+ }
+ }
+
+ /*
+ * Make sure that next key is greater or equivalent then the previous
+ * one.
+ */
+ for (n = 1; n < znode->child_cnt; n++) {
+ cmp = keys_cmp(c, &znode->zbranch[n].key,
+ &znode->zbranch[n - 1].key);
+ if (cmp < 0) {
+ err = 12;
+ goto out;
+ }
+ if (cmp == 0)
+ /* This can only be keys with colliding hash */
+ if (!is_hash_key(c, &znode->zbranch[n].key)) {
+ err = 13;
+ goto out;
+ }
+ }
+
+ for (n = 0; n < znode->child_cnt; n++) {
+ if (znode->zbranch[n].znode == NULL &&
+ (znode->zbranch[n].lnum == 0 ||
+ znode->zbranch[n].len == 0)) {
+ err = 14;
+ goto out;
+ }
+
+ if (znode->zbranch[n].lnum != 0 &&
+ znode->zbranch[n].len == 0) {
+ err = 15;
+ goto out;
+ }
+
+ if (znode->zbranch[n].lnum == 0 &&
+ znode->zbranch[n].len != 0) {
+ err = 16;
+ goto out;
+ }
+
+ if (znode->zbranch[n].lnum == 0 &&
+ znode->zbranch[n].offs != 0) {
+ err = 17;
+ goto out;
+ }
+
+ if (znode->level != 0 && znode->zbranch[n].znode)
+ if (znode->zbranch[n].znode->parent != znode) {
+ err = 18;
+ goto out;
+ }
+ }
+
+ return 0;
+
+out:
+ ubifs_err("failed, error %d", err);
+ ubifs_msg("dump of the znode");
+ dbg_dump_znode(c, znode);
+ if (zp) {
+ ubifs_msg("dump of the parent znode");
+ dbg_dump_znode(c, zp);
+ }
+ dump_stack();
+ return -EINVAL;
+}
+
+/**
+ * dbg_check_tnc - check TNC tree.
+ * @c: UBIFS file-system description object
+ * @extra: do extra checks that are possible at start commit
+ *
+ * This function traverses whole TNC tree and checks every znode. Returns zero
+ * if everything is all right and %-EINVAL if something is wrong with TNC.
+ */
+int dbg_check_tnc(struct ubifs_info *c, int extra)
+{
+ struct ubifs_znode *znode;
+ long clean_cnt = 0, dirty_cnt = 0;
+ int err;
+
+ ubifs_assert(mutex_is_locked(&c->tnc_mutex));
+ if (!c->zroot.znode)
+ return 0;
+
+ znode = tnc_postorder_first(c->zroot.znode);
+ while (znode) {
+ const struct ubifs_zbranch *zbr;
+
+ if (!znode->parent)
+ zbr = &c->zroot;
+ else
+ zbr = &znode->parent->zbranch[znode->iip];
+
+ err = dbg_check_znode(c, zbr);
+ if (err)
+ return err;
+
+ if (extra) {
+ if (ubifs_zn_dirty(znode))
+ dirty_cnt += 1;
+ else
+ clean_cnt += 1;
+ }
+
+ znode = tnc_postorder_next(znode);
+ }
+
+ if (extra) {
+ if (clean_cnt != atomic_long_read(&c->clean_zn_cnt)) {
+ ubifs_err("incorrect clean_zn_cnt %ld, calculated %ld",
+ atomic_long_read(&c->clean_zn_cnt),
+ clean_cnt);
+ return -EINVAL;
+ }
+ if (dirty_cnt != atomic_long_read(&c->dirty_zn_cnt)) {
+ ubifs_err("incorrect dirty_zn_cnt %ld, calculated %ld",
+ atomic_long_read(&c->dirty_zn_cnt),
+ dirty_cnt);
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
+#endif /* CONFIG_UBIFS_FS_DEBUG_CHK_TNC */
+
+#ifdef CONFIG_UBIFS_FS_DEBUG
+
+/**
+ * dbg_walk_sub_tree - walk index subtree.
+ * @c: UBIFS file-system description object
+ * @znode: root znode of the subtree to walk
+ * @leaf_cb: called for each leaf node
+ * @znode_cb: called for each indexing node
+ * @priv: private date which is passed to callbacks
+ *
+ * This is a helper function which recursively walks the UBIFS index, reading
+ * each indexing node from the media if needed. Returns zero in case of success
+ * and a negative error code in case of failure.
+ */
+static int dbg_walk_sub_tree(struct ubifs_info *c, struct ubifs_znode *znode,
+ dbg_leaf_callback leaf_cb,
+ dbg_znode_callback znode_cb, void *priv)
+{
+ int n, err;
+
+ cond_resched();
+
+ if (znode_cb) {
+ err = znode_cb(c, znode, priv);
+ if (err)
+ return err;
+ }
+
+ if (znode->level == 0) {
+ if (!leaf_cb)
+ return 0;
+
+ for (n = 0; n < znode->child_cnt; n++) {
+ struct ubifs_zbranch *zbr = &znode->zbranch[n];
+
+ err = leaf_cb(c, zbr, priv);
+ if (err)
+ return err;
+ }
+ } else
+ for (n = 0; n < znode->child_cnt; n++) {
+ struct ubifs_znode *zn;
+
+ zn = get_znode(c, znode, n);
+ if (IS_ERR(zn))
+ return PTR_ERR(zn);
+ err = dbg_walk_sub_tree(c, zn, leaf_cb, znode_cb, priv);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+/**
+ * dbg_walk_index - walk the on-flash index.
+ * @c: UBIFS file-system description object
+ * @leaf_cb: called for each leaf node
+ * @znode_cb: called for each indexing node
+ * @priv: private date which is passed to callbacks
+ *
+ * This function walks the UBIFS index and calls the @leaf_cb for each leaf
+ * node and @znode_cb for each indexing node. Returns zero in case of success
+ * and a negative error code in case of failure.
+ *
+ * Because 'dbg_walk_sub_tree()' is recursive, it runs the risk of exceeding the
+ * stack space.
+ *
+ * It would be better if this function removed every znode it pulled to into
+ * the TNC, so that the behaviour more closely matched the non-debugging
+ * behaviour.
+ */
+int dbg_walk_index(struct ubifs_info *c, dbg_leaf_callback leaf_cb,
+ dbg_znode_callback znode_cb, void *priv)
+{
+ int err = 0;
+
+ mutex_lock(&c->tnc_mutex);
+ if (!c->zroot.znode) {
+ c->zroot.znode = load_znode(c, &c->zroot, NULL, 0);
+ if (IS_ERR(c->zroot.znode)) {
+ err = PTR_ERR(c->zroot.znode);
+ c->zroot.znode = NULL;
+ goto out;
+ }
+ }
+
+ err = dbg_walk_sub_tree(c, c->zroot.znode, leaf_cb, znode_cb, priv);
+
+out:
+ mutex_unlock(&c->tnc_mutex);
+ return err;
+}
+
+/**
+ * dbg_read_leaf_nolock - read a leaf node.
+ * @c: UBIFS file-system description object
+ * @zbr: key and position of node
+ * @node: node returned
+ *
+ * This function reads leaf defined node by @zbr and returns zero in case of
+ * success or a negative negative error code in case of failure.
+ */
+int dbg_read_leaf_nolock(struct ubifs_info *c, struct ubifs_zbranch *zbr,
+ void *node)
+{
+ return tnc_read_node(c, zbr, node);
+}
+
+#endif /* CONFIG_UBIFS_FS_DEBUG */
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_IDX_SZ
+
+static int dbg_add_size(struct ubifs_info *c, struct ubifs_znode *znode,
+ void *priv)
+{
+ long long *idx_size = priv;
+ int add;
+
+ add = ubifs_idx_node_sz(c, znode->child_cnt);
+ add = ALIGN(add, 8);
+ *idx_size += add;
+ return 0;
+}
+
+int dbg_check_idx_size(struct ubifs_info *c, long long idx_size)
+{
+ int err;
+ long long calc = 0;
+
+
+ err = dbg_walk_index(c, NULL, dbg_add_size, &calc);
+ if (err) {
+ ubifs_err("error %d while walking the index", err);
+ return err;
+ }
+
+ if (calc != idx_size) {
+ ubifs_err("index size check failed");
+ ubifs_err("calculated size is %lld, should be %lld",
+ calc, idx_size);
+ dump_stack();
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+#endif /* CONFIG_UBIFS_FS_DEBUG_CHK_IDX_SZ */
--
1.5.4.1

2008-03-27 13:09:55

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 20/26] UBIFS: add VFS operations

This patch adds implementation of most of the VFS callbacks like
->readdir(), ->write_begin(), and so on. In most cases, it just
does budgeting and calls corresponding journal function, because
all new data goes first to the journal.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/dir.c | 989 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ubifs/file.c | 790 +++++++++++++++++++++++++++++++++++++++++++
fs/ubifs/ioctl.c | 205 +++++++++++
3 files changed, 1984 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
new file mode 100644
index 0000000..672652a
--- /dev/null
+++ b/fs/ubifs/dir.c
@@ -0,0 +1,989 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ * Copyright (C) 2006, 2007 University of Szeged, Hungary
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ * Zoltan Sogor
+ */
+
+/*
+ * This file implements directory operations.
+ *
+ * All FS operations in this file allocate budget before writing anything to the
+ * media. If they fail to allocate it, the error is returned. The only
+ * exceptions are 'ubifs_unlink()' and 'ubifs_rmdir()' which keep working even
+ * if they unable to allocate the budget, because deletion %-ENOSPC failure is
+ * not what users are usually ready to get. UBIFS budgeting subsystem has some
+ * space reserved for these purposes.
+ *
+ * All operations in this file change the parent inode, e.g., 'ubifs_link()'
+ * changes ctime and nlink of the parent inode. The parent inode is written to
+ * the media straight away - it is not marked as dirty and there is no
+ * write-back for it. This was done to simplify file-system recovery which
+ * would otherwise be very difficult to do. So instead of marking the parent
+ * inode dirty, the operations mark it clean.
+ */
+
+#include "ubifs.h"
+
+/*
+ * Provide backing_dev_info in order to disable readahead. For UBIFS, I/O is
+ * not deferred, it is done immediately in readpage, which means the user would
+ * have to wait not just for their own I/O but the readahead I/O as well i.e.
+ * completely pointless.
+ */
+struct backing_dev_info ubifs_backing_dev_info = {
+ .ra_pages = 0, /* Set to zero to disable readahead */
+ .state = 0,
+ .capabilities = BDI_CAP_MAP_COPY,
+ .unplug_io_fn = default_unplug_io_fn,
+};
+
+/**
+ * ubifs_new_inode - allocate new UBIFS inode object.
+ * @c: UBIFS file-system description object
+ * @dir: parent directory inode
+ * @mode: inode mode flags
+ *
+ * This function finds an unused inode number, allocates new inode and
+ * initializes it. Returns new inode in case of success and an error code in
+ * case of failure.
+ */
+struct inode *ubifs_new_inode(struct ubifs_info *c, const struct inode *dir,
+ int mode)
+{
+ struct inode *inode;
+ struct ubifs_inode *ui;
+
+ inode = new_inode(c->vfs_sb);
+ if (!inode)
+ return ERR_PTR(-ENOMEM);
+
+ /*
+ * Set 'S_NOCMTIME' to prevent VFS form updating [mc]time of inodes and
+ * marking them dirty in file write path (see 'file_update_time()').
+ * UBIFS has to fully control "clean <-> dirty" transitions of inodes
+ * to make budgeting work.
+ */
+ inode->i_flags |= (S_NOCMTIME);
+
+ inode->i_uid = current->fsuid;
+ if (dir->i_mode & S_ISGID) {
+ inode->i_gid = dir->i_gid;
+ if (S_ISDIR(mode))
+ mode |= S_ISGID;
+ } else
+ inode->i_gid = current->fsgid;
+ inode->i_mode = mode;
+ inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC;
+ inode->i_mapping->nrpages = 0;
+ /* Disable readahead */
+ inode->i_mapping->backing_dev_info = &ubifs_backing_dev_info;
+
+ switch (mode & S_IFMT) {
+ case S_IFREG:
+ inode->i_mapping->a_ops = &ubifs_file_address_operations;
+ inode->i_op = &ubifs_file_inode_operations;
+ inode->i_fop = &ubifs_file_operations;
+ break;
+ case S_IFDIR:
+ inode->i_op = &ubifs_dir_inode_operations;
+ inode->i_fop = &ubifs_dir_operations;
+ break;
+ case S_IFLNK:
+ inode->i_op = &ubifs_symlink_inode_operations;
+ break;
+ case S_IFSOCK:
+ case S_IFIFO:
+ case S_IFBLK:
+ case S_IFCHR:
+ inode->i_op = &ubifs_file_inode_operations;
+ break;
+ default:
+ BUG();
+ }
+
+ ui = ubifs_inode(inode);
+ ui->flags = ubifs_inode(dir)->flags;
+ if (S_ISLNK(mode))
+ ui->flags &= ~(UBIFS_IMMUTABLE_FL|UBIFS_APPEND_FL);
+ if (!S_ISDIR(mode))
+ /* The "DIRSYNC" flag only applies to directories */
+ ui->flags &= ~UBIFS_DIRSYNC_FL;
+ ubifs_set_inode_flags(inode);
+
+ if (S_ISREG(mode))
+ ui->compr_type = c->default_compr;
+ else
+ ui->compr_type = UBIFS_COMPR_NONE;
+
+ spin_lock(&c->cnt_lock);
+ /* Inode number overflow is currently not supported */
+ if (c->highest_inum >= INUM_WARN_WATERMARK) {
+ if (c->highest_inum >= INUM_WATERMARK) {
+ spin_unlock(&c->cnt_lock);
+ ubifs_err("out of inode numbers");
+ make_bad_inode(inode);
+ iput(inode);
+ return ERR_PTR(-EINVAL);
+ }
+ ubifs_warn("running out of inode numbers (current %lu, max %d)",
+ c->highest_inum, INUM_WATERMARK);
+ }
+
+ inode->i_ino = ++c->highest_inum;
+ inode->i_generation = ++c->vfs_gen;
+ /*
+ * The creation sequence number remains with this inode for its
+ * lifetime. All nodes for this inode have a greater sequence number,
+ * and so it is possible to distinguish obsolete nodes belonging to a
+ * previous incarnation of the same inode number - for example, for the
+ * purpose of rebuilding the index.
+ */
+ ui->creat_sqnum = ++c->max_sqnum;
+ spin_unlock(&c->cnt_lock);
+
+ return inode;
+}
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_OTHER
+static int dbg_check_name(struct ubifs_dent_node *dent, struct qstr *nm)
+{
+ if (le16_to_cpu(dent->nlen) != nm->len)
+ return -EINVAL;
+ if (memcmp(dent->name, nm->name, nm->len))
+ return -EINVAL;
+ return 0;
+}
+#else
+#define dbg_check_name(dent, nm) 0
+#endif
+
+static struct dentry *ubifs_lookup(struct inode *dir, struct dentry *dentry,
+ struct nameidata *nd)
+{
+ int err;
+ union ubifs_key key;
+ struct inode *inode = NULL;
+ struct ubifs_dent_node *dent;
+ struct ubifs_info *c = dir->i_sb->s_fs_info;
+
+ dbg_gen("'%.*s' in dir ino %lu",
+ dentry->d_name.len, dentry->d_name.name, dir->i_ino);
+ ubifs_assert(mutex_is_locked(&dir->i_mutex));
+
+ if (dentry->d_name.len > UBIFS_MAX_NLEN)
+ return ERR_PTR(-ENAMETOOLONG);
+
+ dent = kmalloc(UBIFS_MAX_DENT_NODE_SZ, GFP_NOFS);
+ if (!dent)
+ return ERR_PTR(-ENOMEM);
+
+ dent_key_init(c, &key, dir->i_ino, &dentry->d_name);
+
+ err = ubifs_tnc_lookup_nm(c, &key, dent, &dentry->d_name);
+ if (err) {
+ if (err == -ENOENT) {
+ dbg_gen("not found");
+ goto done;
+ }
+ goto out;
+ }
+
+ if (dbg_check_name(dent, &dentry->d_name)) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ inode = ubifs_iget(dir->i_sb, le64_to_cpu(dent->inum));
+ if (IS_ERR(inode)) {
+ /*
+ * This should not happen. Probably the file-system needs
+ * checking.
+ */
+ ubifs_err("dead directory entry");
+ ubifs_ro_mode(c);
+ err = PTR_ERR(inode);
+ goto out;
+ }
+
+done:
+ kfree(dent);
+ return d_splice_alias(inode, dentry);
+
+out:
+ kfree(dent);
+ return ERR_PTR(err);
+}
+
+static int ubifs_create(struct inode *dir, struct dentry *dentry, int mode,
+ struct nameidata *nd)
+{
+ struct inode *inode;
+ struct ubifs_info *c = dir->i_sb->s_fs_info;
+ struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1 };
+ int err, sz_change = CALC_DENT_SIZE(dentry->d_name.len);
+
+ dbg_gen("dent '%.*s', mode %#x in dir ino %lu",
+ dentry->d_name.len, dentry->d_name.name, mode, dir->i_ino);
+ ubifs_assert(mutex_is_locked(&dir->i_mutex));
+
+ inode = ubifs_new_inode(c, dir, mode);
+ if (IS_ERR(inode))
+ return PTR_ERR(inode);
+
+ err = ubifs_budget_inode_op(c, dir, &req);
+ if (err)
+ goto out;
+
+ dir->i_size += sz_change;
+
+ err = ubifs_jrn_update(c, dir, &dentry->d_name, inode, 0,
+ IS_DIRSYNC(dir), 0);
+ if (err)
+ goto out_budg;
+
+ insert_inode_hash(inode);
+ d_instantiate(dentry, inode);
+ ubifs_release_ino_clean(c, dir, &req);
+ return 0;
+
+out_budg:
+ dir->i_size -= sz_change;
+ ubifs_cancel_ino_op(c, dir, &req);
+ ubifs_err("cannot create regular file, error %d", err);
+out:
+ make_bad_inode(inode);
+ iput(inode);
+ return err;
+}
+
+/**
+ * vfs_dent_type - get VFS directory entry type.
+ * @type: UBIFS directory entry type
+ *
+ * This function converts UBIFS directory entry type into VFS directory entry
+ * type.
+ */
+static unsigned int vfs_dent_type(uint8_t type)
+{
+ switch (type) {
+ case UBIFS_ITYPE_REG:
+ return DT_REG;
+ case UBIFS_ITYPE_DIR:
+ return DT_DIR;
+ case UBIFS_ITYPE_LNK:
+ return DT_LNK;
+ case UBIFS_ITYPE_BLK:
+ return DT_BLK;
+ case UBIFS_ITYPE_CHR:
+ return DT_CHR;
+ case UBIFS_ITYPE_FIFO:
+ return DT_FIFO;
+ case UBIFS_ITYPE_SOCK:
+ return DT_SOCK;
+ default:
+ BUG();
+ }
+ return 0;
+}
+
+/*
+ * The classical Unix view for directory is that it is a linear array of
+ * (name, inode number) entries. Linux/VFS assumes this model as well.
+ * Particularly, readdir() call wants us to return a directory entry offset
+ * which later may be used to continue readdir()-ing the directory or to seek()
+ * to that specific direntry. Obviously UBIFS does not really fit this model
+ * because directory entries are identified by keys, which may collide.
+ *
+ * UBIFS uses directory entry hash value for directory offsets, so
+ * seekdir()/telldir() may not always work because of possible key collisions.
+ * But UBIFS guarantees that consecutive readdir() calls work properly by means
+ * of saving full directory entry name in the private field of the file
+ * description object.
+ */
+static int ubifs_readdir(struct file *filp, void *dirent, filldir_t filldir)
+{
+ int err, over = 0;
+ union ubifs_key key;
+ struct ubifs_dent_node *dent;
+ struct inode *dir = filp->f_path.dentry->d_inode;
+ struct ubifs_info *c = dir->i_sb->s_fs_info;
+ struct ubifs_dent_node *saved = filp->private_data;
+
+ dbg_gen("dir ino %lu, f_pos %#llx", dir->i_ino, filp->f_pos);
+ ubifs_assert(mutex_is_locked(&dir->i_mutex));
+
+ saved = filp->private_data;
+ if (saved)
+ if (filp->f_pos != key_hash_flash(c, &saved->key)) {
+ kfree(saved);
+ filp->private_data = NULL;
+ saved = NULL;
+ }
+
+ /*
+ * File positions 0 and 1 correspond to "." and ".." directory
+ * entries.
+ */
+ if (filp->f_pos == 0) {
+ ubifs_assert(!saved);
+ over = filldir(dirent, ".", 1, 0, dir->i_ino, DT_DIR);
+ if (over)
+ return 0;
+ filp->f_pos = 1;
+ }
+
+ if (filp->f_pos == 1) {
+ ubifs_assert(!saved);
+ over = filldir(dirent, "..", 2, 1,
+ parent_ino(filp->f_path.dentry), DT_DIR);
+ if (over)
+ return 0;
+ filp->f_pos = 2;
+ }
+
+ if (filp->f_pos == 2) {
+ ubifs_assert(!saved);
+
+ lowest_dent_key(c, &key, dir->i_ino);
+ dent = ubifs_tnc_next_ent(c, &key, NULL);
+ if (IS_ERR(dent)) {
+ err = PTR_ERR(dent);
+ goto out;
+ }
+
+ ubifs_assert(dent->ch.sqnum > ubifs_inode(dir)->creat_sqnum);
+
+ dbg_gen("feed '%s', new f_pos %#x",
+ dent->name, key_hash_flash(c, &dent->key));
+ over = filldir(dirent, dent->name,
+ le16_to_cpu(dent->nlen), filp->f_pos,
+ le64_to_cpu(dent->inum),
+ vfs_dent_type(dent->type));
+ if (over) {
+ kfree(dent);
+ return 0;
+ }
+
+ filp->private_data = dent;
+ filp->f_pos = key_hash_flash(c, &dent->key);
+ saved = filp->private_data;
+ }
+
+ while (1) {
+ if (saved) {
+ struct qstr nm;
+
+ key_read(c, &saved->key, &key);
+ nm.name = saved->name;
+ nm.len = le16_to_cpu(saved->nlen);
+ dent = ubifs_tnc_next_ent(c, &key, &nm);
+ } else {
+ dent_key_init_hash(c, &key, dir->i_ino, filp->f_pos);
+ dent = ubifs_tnc_next_ent(c, &key, NULL);
+ }
+ if (unlikely(IS_ERR(dent))) {
+ err = PTR_ERR(dent);
+ goto out;
+ }
+
+ ubifs_assert(dent->ch.sqnum > ubifs_inode(dir)->creat_sqnum);
+ dbg_gen("feed '%s', new f_pos %#x",
+ dent->name, key_hash_flash(c, &dent->key));
+
+ over = filldir(dirent, dent->name, le16_to_cpu(dent->nlen),
+ filp->f_pos, le64_to_cpu(dent->inum),
+ vfs_dent_type(dent->type));
+ if (over) {
+ kfree(dent);
+ return 0;
+ }
+
+ filp->f_pos = key_hash_flash(c, &dent->key);
+ filp->private_data = dent;
+ kfree(saved);
+ saved = filp->private_data;
+ }
+
+ return 0;
+
+out:
+ if (err != -ENOENT) {
+ ubifs_err("cannot find next direntry, error %d", err);
+ return err;
+ }
+
+ return 0;
+}
+
+static int ubifs_dir_release(struct inode *dir, struct file *filp)
+{
+ kfree(filp->private_data);
+ filp->private_data = NULL;
+ return 0;
+}
+
+static int ubifs_link(struct dentry *old_dentry, struct inode *dir,
+ struct dentry *dentry)
+{
+ struct ubifs_info *c = dir->i_sb->s_fs_info;
+ struct inode *inode = old_dentry->d_inode;
+ struct ubifs_inode *ui = ubifs_inode(inode);
+ struct ubifs_budget_req req = { .new_dent = 1, .dirtied_ino = 1,
+ .dirtied_ino_d = ui->data_len };
+ int err, sz_change = CALC_DENT_SIZE(dentry->d_name.len);
+
+ dbg_gen("dent '%.*s' to ino %lu (nlink %d) in dir ino %lu",
+ dentry->d_name.len, dentry->d_name.name, inode->i_ino,
+ inode->i_nlink, dir->i_ino);
+ ubifs_assert(mutex_is_locked(&dir->i_mutex));
+ ubifs_assert(mutex_is_locked(&inode->i_mutex));
+
+ err = ubifs_budget_inode_op(c, dir, &req);
+ if (err)
+ return err;
+
+ inode->i_ctime = CURRENT_TIME_SEC;
+ inc_nlink(inode);
+
+ dir->i_size += sz_change;
+ dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
+
+ err = ubifs_jrn_update(c, dir, &dentry->d_name, inode, 0,
+ IS_DIRSYNC(dir), 0);
+ if (err)
+ goto out_budg;
+
+ atomic_inc(&inode->i_count);
+ d_instantiate(dentry, inode);
+ ubifs_release_ino_clean(c, dir, &req);
+ return 0;
+
+out_budg:
+ dir->i_size -= sz_change;
+ ubifs_cancel_ino_op(c, dir, &req);
+ drop_nlink(inode);
+ iput(inode);
+ return err;
+}
+
+static int ubifs_unlink(struct inode *dir, struct dentry *dentry)
+{
+ struct ubifs_info *c = dir->i_sb->s_fs_info;
+ struct inode *inode = dentry->d_inode;
+ struct ubifs_budget_req req = { .mod_dent = 1, .dirtied_ino = 1 };
+ int sz_change = CALC_DENT_SIZE(dentry->d_name.len);
+ int err, budgeted = 1;
+
+ dbg_gen("dent '%.*s' from ino %lu (nlink %d) in dir ino %lu",
+ dentry->d_name.len, dentry->d_name.name, inode->i_ino,
+ inode->i_nlink, dir->i_ino);
+ ubifs_assert(mutex_is_locked(&dir->i_mutex));
+ ubifs_assert(mutex_is_locked(&inode->i_mutex));
+ ubifs_assert(!S_ISDIR(inode->i_mode));
+
+ err = ubifs_budget_inode_op(c, dir, &req);
+ if (err) {
+ if (err != -ENOSPC)
+ return err;
+ err = 0;
+ budgeted = 0;
+ }
+
+ dir->i_size -= sz_change;
+ dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
+
+ inode->i_ctime = dir->i_ctime;
+ drop_nlink(inode);
+
+ err = ubifs_jrn_update(c, dir, &dentry->d_name, inode, 1,
+ IS_DIRSYNC(dir), 0);
+ if (err)
+ goto out_budg;
+
+ if (budgeted)
+ ubifs_release_ino_clean(c, dir, &req);
+
+ return 0;
+
+out_budg:
+ dir->i_size += sz_change;
+ inc_nlink(inode);
+ if (budgeted)
+ ubifs_cancel_ino_op(c, dir, &req);
+ return err;
+}
+
+/**
+ * check_dir_empty - check if a directory is empty or not.
+ * @c: UBIFS file-system description object
+ * @dir: VFS inode object of the directory to check
+ *
+ * This function checks if directory @dir is empty. Returns zero if the
+ * directory is empty, %-ENOTEMPTY if it is not, and other negative error codes
+ * in case of of errors.
+ */
+static int check_dir_empty(struct ubifs_info *c, struct inode *dir)
+{
+ struct ubifs_dent_node *dent;
+ union ubifs_key key;
+ int err;
+
+ lowest_dent_key(c, &key, dir->i_ino);
+ dent = ubifs_tnc_next_ent(c, &key, NULL);
+ if (IS_ERR(dent)) {
+ err = PTR_ERR(dent);
+ if (err == -ENOENT)
+ err = 0;
+ } else {
+ kfree(dent);
+ err = -ENOTEMPTY;
+ }
+
+ return err;
+}
+
+static int ubifs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+ struct ubifs_info *c = dir->i_sb->s_fs_info;
+ struct inode *inode = dentry->d_inode;
+ struct ubifs_budget_req req = { .mod_dent = 1, .dirtied_ino = 1 };
+ int sz_change = CALC_DENT_SIZE(dentry->d_name.len);
+ int err, budgeted = 0;
+
+ dbg_gen("directory '%.*s', ino %lu in dir ino %lu", dentry->d_name.len,
+ dentry->d_name.name, inode->i_ino, dir->i_ino);
+ ubifs_assert(mutex_is_locked(&dir->i_mutex));
+ ubifs_assert(mutex_is_locked(&inode->i_mutex));
+ ubifs_assert(S_ISDIR(inode->i_mode));
+
+ err = check_dir_empty(c, dentry->d_inode);
+ if (err)
+ return err;
+
+ budgeted = 1;
+ err = ubifs_budget_inode_op(c, dir, &req);
+ if (err) {
+ if (err != -ENOSPC)
+ return err;
+ budgeted = 0;
+ }
+
+ dir->i_size -= sz_change;
+ dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
+ drop_nlink(dir);
+
+ inode->i_size = 0;
+ inode->i_ctime = dir->i_ctime;
+ drop_nlink(inode);
+ drop_nlink(inode);
+
+ err = ubifs_jrn_update(c, dir, &dentry->d_name, inode, 1,
+ IS_DIRSYNC(dir), 0);
+ if (err)
+ goto out_budg;
+
+ if (budgeted)
+ ubifs_release_ino_clean(c, dir, &req);
+
+ return 0;
+
+out_budg:
+ dir->i_size += sz_change;
+ inc_nlink(dir);
+ inc_nlink(inode);
+ inc_nlink(inode);
+ if (budgeted)
+ ubifs_cancel_ino_op(c, dir, &req);
+ return err;
+}
+
+static int ubifs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+ struct inode *inode;
+ struct ubifs_info *c = dir->i_sb->s_fs_info;
+ struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1 };
+ int err, sz_change = CALC_DENT_SIZE(dentry->d_name.len);
+
+ dbg_gen("dent '%.*s', mode %#x in dir ino %lu",
+ dentry->d_name.len, dentry->d_name.name, mode, dir->i_ino);
+ ubifs_assert(mutex_is_locked(&dir->i_mutex));
+
+ err = ubifs_budget_inode_op(c, dir, &req);
+ if (err)
+ return err;
+
+ inode = ubifs_new_inode(c, dir, S_IFDIR | mode);
+ if (IS_ERR(inode)) {
+ err = PTR_ERR(inode);
+ goto out_budg;
+ }
+
+ insert_inode_hash(inode);
+ inc_nlink(inode);
+
+ dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
+ dir->i_size += sz_change;
+ inc_nlink(dir);
+
+ err = ubifs_jrn_update(c, dir, &dentry->d_name, inode, 0,
+ IS_DIRSYNC(dir), 0);
+ if (err) {
+ ubifs_err("cannot create directory, error %d", err);
+ goto out_inode;
+ }
+
+ d_instantiate(dentry, inode);
+ ubifs_release_ino_clean(c, dir, &req);
+ return 0;
+
+out_inode:
+ dir->i_size -= sz_change;
+ drop_nlink(dir);
+ make_bad_inode(inode);
+ iput(inode);
+out_budg:
+ ubifs_cancel_ino_op(c, dir, &req);
+ return err;
+}
+
+static int ubifs_mknod(struct inode *dir, struct dentry *dentry,
+ int mode, dev_t rdev)
+{
+ struct inode *inode;
+ struct ubifs_info *c = dir->i_sb->s_fs_info;
+ struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1 };
+ union ubifs_dev_desc *dev = NULL;
+ int sz_change = CALC_DENT_SIZE(dentry->d_name.len);
+ int err, devlen = 0;
+
+ dbg_gen("dent '%.*s' in dir ino %lu",
+ dentry->d_name.len, dentry->d_name.name, dir->i_ino);
+ ubifs_assert(mutex_is_locked(&dir->i_mutex));
+
+ if (!new_valid_dev(rdev))
+ return -EINVAL;
+
+ if (S_ISBLK(mode) || S_ISCHR(mode)) {
+ dev = kmalloc(sizeof(union ubifs_dev_desc), GFP_NOFS);
+ if (!dev)
+ return -ENOMEM;
+ devlen = ubifs_encode_dev(dev, rdev);
+ }
+
+ err = ubifs_budget_inode_op(c, dir, &req);
+ if (err) {
+ kfree(dev);
+ return err;
+ }
+
+ inode = ubifs_new_inode(c, dir, mode);
+ if (IS_ERR(inode)) {
+ kfree(dev);
+ err = PTR_ERR(inode);
+ goto out_budg;
+ }
+
+ init_special_inode(inode, inode->i_mode, rdev);
+
+ inode->i_size = devlen;
+ ubifs_inode(inode)->data = dev;
+ ubifs_inode(inode)->data_len = devlen;
+
+ dir->i_size += sz_change;
+
+ err = ubifs_jrn_update(c, dir, &dentry->d_name, inode, 0,
+ IS_DIRSYNC(dir), 0);
+ if (err)
+ goto out_inode;
+
+ insert_inode_hash(inode);
+ d_instantiate(dentry, inode);
+ ubifs_release_ino_clean(c, dir, &req);
+ return 0;
+
+out_inode:
+ dir->i_size -= sz_change;
+ make_bad_inode(inode);
+ iput(inode);
+out_budg:
+ ubifs_cancel_ino_op(c, dir, &req);
+ return err;
+}
+
+static int ubifs_symlink(struct inode *dir, struct dentry *dentry,
+ const char *symname)
+{
+ struct inode *inode;
+ struct ubifs_inode *ui;
+ struct ubifs_info *c = dir->i_sb->s_fs_info;
+ int err, len = strlen(symname);
+ int sz_change = CALC_DENT_SIZE(dentry->d_name.len);
+ struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1,
+ .new_ino_d = len };
+
+ dbg_gen("dent '%.*s', target '%s' in dir ino %lu", dentry->d_name.len,
+ dentry->d_name.name, symname, dir->i_ino);
+ ubifs_assert(mutex_is_locked(&dir->i_mutex));
+
+ if (len > UBIFS_MAX_INO_DATA)
+ return -ENAMETOOLONG;
+
+ err = ubifs_budget_inode_op(c, dir, &req);
+ if (err)
+ return err;
+
+ inode = ubifs_new_inode(c, dir, S_IFLNK | S_IRWXUGO);
+ if (IS_ERR(inode)) {
+ err = PTR_ERR(inode);
+ goto out_budg;
+ }
+
+ ui = ubifs_inode(inode);
+ ui->data = kmalloc(len + 1, GFP_KERNEL);
+ if (!ui->data) {
+ err = -ENOMEM;
+ goto out_inode;
+ }
+
+ memcpy(ui->data, symname, len);
+ ((char *)ui->data)[len] = '\0';
+ /*
+ * The terminating zero byte is not written to the flash media and it
+ * is put just to make later in-memory string processing simpler. Thus,
+ * data length is @len, not @len + %1.
+ */
+ ui->data_len = len;
+ inode->i_size = len;
+
+ dir->i_size += sz_change;
+
+ err = ubifs_jrn_update(c, dir, &dentry->d_name, inode, 0,
+ IS_DIRSYNC(dir), 0);
+ if (err)
+ goto out_dir;
+
+ insert_inode_hash(inode);
+ d_instantiate(dentry, inode);
+ ubifs_release_ino_clean(c, dir, &req);
+ return 0;
+
+out_dir:
+ dir->i_size -= sz_change;
+out_inode:
+ make_bad_inode(inode);
+ iput(inode);
+out_budg:
+ ubifs_cancel_ino_op(c, dir, &req);
+ return err;
+}
+
+static int ubifs_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry)
+{
+ struct ubifs_info *c = old_dir->i_sb->s_fs_info;
+ struct inode *old_inode = old_dentry->d_inode;
+ struct inode *new_inode = new_dentry->d_inode;
+ int err, move = (new_dir != old_dir);
+ int is_dir = S_ISDIR(old_inode->i_mode);
+ int unlink = !!new_inode;
+ int dirsync = (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir));
+ int new_sz = CALC_DENT_SIZE(new_dentry->d_name.len);
+ int old_sz = CALC_DENT_SIZE(old_dentry->d_name.len);
+ struct ubifs_budget_req req = { .new_dent = 1, .mod_dent = 1 };
+
+ dbg_gen("dent '%.*s' ino %lu in dir ino %lu to dent '%.*s' in "
+ "dir ino %lu", old_dentry->d_name.len, old_dentry->d_name.name,
+ old_inode->i_ino, old_dir->i_ino, new_dentry->d_name.len,
+ new_dentry->d_name.name, new_dir->i_ino);
+ ubifs_assert(mutex_is_locked(&old_dir->i_mutex));
+ ubifs_assert(mutex_is_locked(&new_dir->i_mutex));
+ if (unlink)
+ ubifs_assert(mutex_is_locked(&new_inode->i_mutex));
+
+ if (unlink && is_dir) {
+ err = check_dir_empty(c, new_inode);
+ if (err)
+ return err;
+ }
+
+ if (move) {
+ req.dirtied_ino = 1;
+ if (unlink) {
+ req.dirtied_ino += 2;
+ req.dirtied_ino_d = ubifs_inode(new_inode)->data_len;
+ }
+ }
+
+ /*
+ * Note, rename may write @new_dir inode if the directory entry is
+ * moved there. And if the @new_dir is dirty, we do not bother to make
+ * it clean. It could be done, but requires extra coding which does not
+ * seem to be really worth it.
+ */
+ err = ubifs_budget_inode_op(c, old_dir, &req);
+ if (err)
+ return err;
+
+ /*
+ * Like most other Unix systems, set the ctime for inodes on a
+ * rename.
+ */
+ old_inode->i_ctime = CURRENT_TIME_SEC;
+
+ /*
+ * If we moved a directory to another parent directory, decrement
+ * 'i_nlink' of the old parent. Also, update 'i_size' of the old parent
+ * as well as its [mc]time.
+ */
+ if (is_dir && move)
+ drop_nlink(old_dir);
+ old_dir->i_size -= old_sz;
+ old_dir->i_mtime = old_dir->i_ctime = CURRENT_TIME_SEC;
+ new_dir->i_mtime = new_dir->i_ctime = CURRENT_TIME_SEC;
+
+ /*
+ * If we moved a directory object to new directory, parent's 'i_nlink'
+ * should be adjusted.
+ */
+ if (move && is_dir)
+ inc_nlink(new_dir);
+
+ /*
+ * And finally, if we unlinked a direntry which happened to have the
+ * same name as the moved direntry, we have to decrement 'i_nlink' of
+ * the unlinked inode and change its ctime.
+ */
+ if (unlink) {
+ /*
+ * Directories cannot have hard-links, so if this is a
+ * directory, decrement its 'i_nlink' twice because an empty
+ * directory has 'i_nlink' 2.
+ */
+ if (is_dir)
+ drop_nlink(new_inode);
+ new_inode->i_ctime = CURRENT_TIME_SEC;
+ drop_nlink(new_inode);
+ } else
+ new_dir->i_size += new_sz;
+
+ err = ubifs_jrn_rename(c, old_dir, old_dentry, new_dir, new_dentry,
+ dirsync);
+ if (err)
+ goto out_inode;
+
+ ubifs_release_ino_clean(c, old_dir, &req);
+ return 0;
+
+out_inode:
+ if (unlink) {
+ if (is_dir)
+ inc_nlink(new_inode);
+ inc_nlink(new_inode);
+ } else
+ new_dir->i_size -= new_sz;
+ old_dir->i_size += old_sz;
+ if (is_dir && move) {
+ drop_nlink(new_dir);
+ inc_nlink(old_dir);
+ }
+ ubifs_cancel_ino_op(c, old_dir, &req);
+ return err;
+}
+
+int ubifs_getattr(struct vfsmount *mnt, struct dentry *dentry,
+ struct kstat *stat)
+{
+ struct inode *inode = dentry->d_inode;
+ loff_t size;
+
+ stat->dev = inode->i_sb->s_dev;
+ stat->ino = inode->i_ino;
+ stat->mode = inode->i_mode;
+ stat->nlink = inode->i_nlink;
+ stat->uid = inode->i_uid;
+ stat->gid = inode->i_gid;
+ stat->rdev = inode->i_rdev;
+ stat->atime = inode->i_atime;
+ stat->mtime = inode->i_mtime;
+ stat->ctime = inode->i_ctime;
+ stat->blksize = UBIFS_BLOCK_SIZE;
+ stat->size = i_size_read(inode);
+
+ spin_lock(&inode->i_lock);
+ size = ubifs_inode(inode)->xattr_size;
+ spin_unlock(&inode->i_lock);
+
+ /*
+ * Unfortunately, the 'stat()' system call was designed for block
+ * device based file systems, and it is not appropriate for UBIFS,
+ * because UBIFS does not have notion of "block". For example, it is
+ * difficult to tell how many block a directory takes - it actually
+ * takes less then 300 bytes, but we have to round it to block size,
+ * which introduces large mistake. This makes utilities like 'du' to
+ * report completely senseless numbers. This is the reason why UBIFS
+ * goes the same way as JFFS2 - it reports zero blocks for everything
+ * but regular files, which makes more sense than reporting completely
+ * wrong sizes.
+ */
+ if (S_ISREG(inode->i_mode))
+ size += stat->size;
+
+ size = ALIGN(size, UBIFS_BLOCK_SIZE);
+ /*
+ * Note, userspace expects 512-byte blocks count irrespectively of what
+ * was reported in @stat->size.
+ */
+ stat->blocks = size >> 9;
+
+ return 0;
+}
+
+struct inode_operations ubifs_dir_inode_operations = {
+ .lookup = ubifs_lookup,
+ .create = ubifs_create,
+ .link = ubifs_link,
+ .symlink = ubifs_symlink,
+ .unlink = ubifs_unlink,
+ .mkdir = ubifs_mkdir,
+ .rmdir = ubifs_rmdir,
+ .mknod = ubifs_mknod,
+ .rename = ubifs_rename,
+ .setattr = ubifs_setattr,
+ .getattr = ubifs_getattr,
+#ifdef CONFIG_UBIFS_FS_XATTR
+ .setxattr = ubifs_setxattr,
+ .getxattr = ubifs_getxattr,
+ .listxattr = ubifs_listxattr,
+ .removexattr = ubifs_removexattr,
+#endif
+};
+
+struct file_operations ubifs_dir_operations = {
+ .llseek = generic_file_llseek,
+ .release = ubifs_dir_release,
+ .read = generic_read_dir,
+ .readdir = ubifs_readdir,
+ .fsync = ubifs_fsync,
+ .ioctl = ubifs_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = ubifs_compat_ioctl,
+#endif
+};
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
new file mode 100644
index 0000000..2dcd435
--- /dev/null
+++ b/fs/ubifs/file.c
@@ -0,0 +1,790 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file implements VFS file and inode operations of regular files, device
+ * nodes and symlinks as well as address space operations.
+ *
+ * UBIFS uses 2 page flags: PG_private and PG_checked. PG_private is set if the
+ * page is dirty and is used for budgeting purposes - dirty pages should not be
+ * budgeted. The PG_checked flag is set if full budgeting is required for the
+ * page e.g., when it corresponds to a file hole or it is just beyond the file
+ * size. The budgeting is done in 'ubifs_write_begin()', because it is OK to
+ * fail in this function, and the budget is released in 'ubifs_write_end()'. So
+ * the PG_private and PG_checked flags carry the information about how the page
+ * was budgeted, to make it possible to release the budget properly.
+ *
+ * A thing to keep in mind: inode's 'i_mutex' is locked in most VFS operations
+ * we implement. However, this is not true for '->writepage()', which might be
+ * called with 'i_mutex' unlocked. For example, when pdflush is performing
+ * write-back, it calls 'writepage()' with unlocked 'i_mutex', although the
+ * inode has 'I_LOCK' flag in this case. At "normal" work-paths 'i_mutex' is
+ * locked in '->writepage', e.g. in "sys_write -> alloc_pages -> direct reclaim
+ * path'. So, in '->writepage()' we are only guaranteed that the page is
+ * locked.
+ *
+ * Similarly, 'i_mutex' does not have to be locked in readpage(), e.g.,
+ * readahead path does not have it locked ("sys_read -> generic_file_aio_read
+ * -> ondemand_readahead -> readpage"). In case of readahead, 'I_LOCK' flag is
+ * not set as well.
+ *
+ * This, for example means that there might be 2 concurrent '->writepage()'
+ * calls for the same inode, but different inode dirty pages.
+ */
+
+#include "ubifs.h"
+#include <linux/mount.h>
+
+static int do_readpage(struct page *page)
+{
+ void *addr;
+ int err, len, out_len;
+ union ubifs_key key;
+ struct ubifs_data_node *dn;
+ struct inode *inode = page->mapping->host;
+ struct ubifs_info *c = inode->i_sb->s_fs_info;
+ unsigned int dlen;
+ loff_t i_size = i_size_read(inode);
+
+ dbg_gen("ino %lu, pg %lu, i_size %lld, flags %#lx",
+ inode->i_ino, page->index, i_size, page->flags);
+ ubifs_assert(PageLocked(page));
+ ubifs_assert(!PageChecked(page));
+ ubifs_assert(!PagePrivate(page));
+
+ addr = kmap(page);
+
+ if (((loff_t)page->index << PAGE_CACHE_SHIFT) >= i_size) {
+ /* Reading beyond inode */
+ SetPageChecked(page);
+ memset(addr, 0, PAGE_CACHE_SIZE);
+ goto out;
+ }
+
+ dn = kmalloc(UBIFS_MAX_DATA_NODE_SZ, GFP_NOFS);
+ if (!dn) {
+ err = -ENOMEM;
+ goto error;
+ }
+
+ data_key_init(c, &key, inode->i_ino, page->index);
+ err = ubifs_tnc_lookup(c, &key, dn);
+ if (err) {
+ if (err == -ENOENT) {
+ /* Not found, so it must be a hole */
+ SetPageChecked(page);
+ memset(addr, 0, PAGE_CACHE_SIZE);
+ dbg_gen("hole");
+ goto out_free;
+ }
+ ubifs_err("cannot read page %lu of inode %lu, error %d",
+ page->index, inode->i_ino, err);
+ goto error;
+ }
+
+ ubifs_assert(dn->ch.sqnum > ubifs_inode(inode)->creat_sqnum);
+
+ len = le32_to_cpu(dn->size);
+ if (len <= 0 || len > PAGE_CACHE_SIZE)
+ goto dump;
+
+ dlen = le32_to_cpu(dn->ch.len) - UBIFS_DATA_NODE_SZ;
+ out_len = PAGE_CACHE_SIZE;
+ err = ubifs_decompress(&dn->data, dlen, addr, &out_len,
+ le16_to_cpu(dn->compr_type));
+ if (err || len != out_len)
+ goto dump;
+
+ /*
+ * Data length can be less than a full page, even for blocks that are
+ * not the last in the file (e.g., as a result of making a hole and
+ * appending data). Ensure that the remainder is zeroed out.
+ */
+ if (len < PAGE_CACHE_SIZE)
+ memset(addr + len, 0, PAGE_CACHE_SIZE - len);
+
+out_free:
+ kfree(dn);
+out:
+ SetPageUptodate(page);
+ ClearPageError(page);
+ flush_dcache_page(page);
+ kunmap(page);
+ return 0;
+
+dump:
+ err = -EINVAL;
+ ubifs_err("bad data node (page %lu, inode %lu)",
+ page->index, inode->i_ino);
+ dbg_dump_node(c, dn);
+error:
+ kfree(dn);
+ ClearPageUptodate(page);
+ SetPageError(page);
+ flush_dcache_page(page);
+ kunmap(page);
+ return err;
+}
+
+static int ubifs_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ struct inode *inode = mapping->host;
+ struct ubifs_info *c = inode->i_sb->s_fs_info;
+ pgoff_t index = pos >> PAGE_CACHE_SHIFT;
+ struct ubifs_budget_req req = { .new_page = 1 };
+ loff_t i_size = i_size_read(inode);
+ int uninitialized_var(err);
+ struct page *page;
+
+ ubifs_assert(mutex_is_locked(&inode->i_mutex));
+ ubifs_assert(!(inode->i_sb->s_flags & MS_RDONLY));
+ dbg_eat_memory();
+
+ if (unlikely(c->ro_media))
+ return -EINVAL;
+
+ /*
+ * We are about to have a page of data written and we have to budget for
+ * this. The very important point here is that we have to budget before
+ * locking the page, because budgeting may force write-back, which
+ * would wait on locked pages and deadlock if we had the page locked.
+ *
+ * At this point we do not know anything about the page of data we are
+ * going to change, so assume the biggest budget (i.e., assume that
+ * this is a new page of data and it does not override an older page of
+ * data in the inode). Later the budget will be amended if this is not
+ * true.
+ */
+ if (pos + len > i_size)
+ /*
+ * We are writing beyond the file which means we are going to
+ * change inode size and make the inode dirty. And in turn,
+ * this means we have to budget for making the inode dirty.
+ *
+ * Note, if the inode is already dirty,
+ * 'ubifs_budget_inode_op()' will not allocate any budget,
+ * but will just lock the @budg_mutex of the inode to prevent
+ * it from becoming clean before we have changed its size,
+ * which is going to happen in 'ubifs_write_end()'.
+ */
+ err = ubifs_budget_inode_op(c, inode, &req);
+ else
+ /*
+ * The inode is not going to be marked as dirty by this write
+ * operation, do not budget for this.
+ */
+ err = ubifs_budget_space(c, &req);
+ if (unlikely(err))
+ return err;
+
+ page = __grab_cache_page(mapping, index);
+ if (unlikely(!page)) {
+ err = -ENOMEM;
+ goto out_release;
+ }
+
+ if (!PageUptodate(page)) {
+ /*
+ * The page is not loaded from the flash and has to be loaded
+ * unless we are writing all of it.
+ */
+ if (!(pos & PAGE_CACHE_MASK) && len == PAGE_CACHE_SIZE)
+ /*
+ * Set the PG_checked flag to make the further code
+ * assume the page is new.
+ */
+ SetPageChecked(page);
+ else {
+ err = do_readpage(page);
+ if (err)
+ goto out_unlock;
+ }
+
+ SetPageUptodate(page);
+ ClearPageError(page);
+ }
+
+ if (PagePrivate(page))
+ /*
+ * The page is dirty, which means it was budgeted twice:
+ * o first time the budget was allocated by the task which
+ * made the page dirty and set the PG_private flag;
+ * o and then we budgeted for it for the second time at the
+ * very beginning of this function.
+ *
+ * So what we have to do is to release the page budget we
+ * allocated.
+ *
+ * Note, the page write operation may change the inode length,
+ * which makes it dirty and means the budget should be
+ * allocated. This was done above in the "pos + len > i_size"
+ * case. If this was done, we do not free the the inode budget,
+ * because we cannot as we are really going to mark it dirty in
+ * the 'ubifs_write_end()' function.
+ */
+ ubifs_release_new_page_budget(c);
+ else if (!PageChecked(page))
+ /*
+ * The page is not new, which means we are changing the page
+ * which already exists on the media. This means that changing
+ * the page does not make the amount of indexing information
+ * larger, and this part of the budget which we have already
+ * acquired may be released.
+ */
+ ubifs_convert_page_budget(c);
+
+ *pagep = page;
+ return 0;
+
+out_unlock:
+ unlock_page(page);
+out_release:
+ page_cache_release(page);
+ return err;
+
+}
+
+static int ubifs_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
+{
+ struct inode *inode = mapping->host;
+ struct ubifs_inode *ui = ubifs_inode(inode);
+ struct ubifs_info *c = inode->i_sb->s_fs_info;
+ loff_t i_size = i_size_read(inode);
+
+ dbg_gen("ino %lu, pos %llu, pg %lu, len %u, copied %d, i_size %lld",
+ inode->i_ino, pos, page->index, len, copied, i_size);
+ ubifs_assert(PageUptodate(page));
+ ubifs_assert(mutex_is_locked(&inode->i_mutex));
+ ubifs_assert(copied <= len);
+
+ if (unlikely(copied < len && len == PAGE_CACHE_SIZE)) {
+ /*
+ * VFS copied less data to the page that it indented and
+ * declared in its '->write_begin()' call via the @len
+ * argument. If the page was not up-to-date, and @len was
+ * @PAGE_CACHE_SIZE, the 'ubifs_write_begin()' function did
+ * not load it from the media (for optimization reasons). This
+ * means that part of the page contains garbage. So read the
+ * page now.
+ */
+ dbg_gen("copied %d instead of %d, read page and repeat",
+ copied, len);
+
+ if (pos > inode->i_size)
+ mutex_unlock(&ui->budg_mutex);
+
+ copied = do_readpage(page);
+
+ /*
+ * Return 0 to force VFS to repeat the whole operation, or the
+ * error code if 'do_readpage()' failed.
+ */
+ goto out;
+ }
+
+ if (!PagePrivate(page)) {
+ SetPagePrivate(page);
+ atomic_long_inc(&c->dirty_pg_cnt);
+ __set_page_dirty_nobuffers(page);
+ }
+
+ if (pos + len > i_size) {
+ i_size_write(inode, pos + len);
+
+ /*
+ * Note, we do not set @I_DIRTY_PAGES (which means that the
+ * inode has dirty pages), this has been done in
+ * '__set_page_dirty_nobuffers()'.
+ */
+ mark_inode_dirty_sync(inode);
+
+ /*
+ * The inode has been marked dirty, unlock it. This is a bit
+ * hacky because normally we would have to call
+ * 'ubifs_release_ino_dirty()'. But we know there is nothing
+ * to release because page's budget will be released in
+ * 'ubifs_write_page()' and inode's budget will be released in
+ * 'ubifs_write_inode()', so just unlock the inode here for
+ * optimization.
+ */
+ mutex_unlock(&ui->budg_mutex);
+ }
+
+out:
+ unlock_page(page);
+ page_cache_release(page);
+ return copied;
+}
+
+static int ubifs_readpage(struct file *file, struct page *page)
+{
+ do_readpage(page);
+ unlock_page(page);
+ return 0;
+}
+
+/**
+ * release_existing_page_budget - release budget of an existing page.
+ * @c: UBIFS file-system description object
+ *
+ * This is a helper function which releases budget corresponding to the budget
+ * of changing one one page of data which already exists on the flash media.
+ *
+ * This function was not moved to "budget.c" because there is only one user.
+ */
+static void release_existing_page_budget(struct ubifs_info *c)
+{
+ struct ubifs_budget_req req = { .dd_growth = c->page_budget};
+
+ ubifs_release_budget(c, &req);
+}
+
+static int do_writepage(struct page *page, int len)
+{
+ int err;
+ void *addr;
+ union ubifs_key key;
+ struct inode *inode = page->mapping->host;
+ struct ubifs_info *c = inode->i_sb->s_fs_info;
+
+ /* Update radix tree tags */
+ set_page_writeback(page);
+
+ /* One page cache page is one UBIFS block */
+ data_key_init(c, &key, inode->i_ino, page->index);
+ addr = kmap(page);
+
+ err = ubifs_jrn_write_data(c, inode, &key, addr, len);
+ if (err) {
+ SetPageError(page);
+ ubifs_err("cannot write page %lu of inode %lu, error %d",
+ page->index, inode->i_ino, err);
+ ubifs_ro_mode(c);
+ }
+
+ ubifs_assert(PagePrivate(page));
+ if (PageChecked(page))
+ ubifs_release_new_page_budget(c);
+ else
+ release_existing_page_budget(c);
+
+ atomic_long_dec(&c->dirty_pg_cnt);
+ ClearPagePrivate(page);
+ ClearPageChecked(page);
+
+ kunmap(page);
+ unlock_page(page);
+ end_page_writeback(page);
+
+ return err;
+}
+
+static int ubifs_writepage(struct page *page, struct writeback_control *wbc)
+{
+ struct inode *inode = page->mapping->host;
+ loff_t i_size = i_size_read(inode);
+ pgoff_t end_index = i_size >> PAGE_CACHE_SHIFT;
+ int len;
+ void *kaddr;
+
+ dbg_gen("ino %lu, pg %lu, pg flags %#lx",
+ inode->i_ino, page->index, page->flags);
+ ubifs_assert(PageUptodate(page));
+ ubifs_assert(!PageWriteback(page));
+ ubifs_assert(PagePrivate(page));
+ ubifs_assert(!(inode->i_sb->s_flags & MS_RDONLY));
+
+ /* Is the page fully inside i_size? */
+ if (page->index < end_index)
+ return do_writepage(page, PAGE_CACHE_SIZE);
+
+ /* Is the page fully outside i_size? (truncate in progress) */
+ len = i_size & (PAGE_CACHE_SIZE - 1);
+ if (page->index >= end_index + 1 || !len) {
+ unlock_page(page);
+ return 0;
+ }
+
+ /*
+ * The page straddles i_size. It must be zeroed out on each and every
+ * writepage invocation because it may be mmapped. "A file is mapped
+ * in multiples of the page size. For a file that is not a multiple of
+ * the page size, the remaining memory is zeroed when mapped, and
+ * writes to that region are not written out to the file."
+ */
+ kaddr = kmap_atomic(page, KM_USER0);
+ memset(kaddr + len, 0, PAGE_CACHE_SIZE - len);
+ flush_dcache_page(page);
+ kunmap_atomic(kaddr, KM_USER0);
+
+ return do_writepage(page, len);
+}
+
+static int ubifs_trunc(struct inode *inode, loff_t new_size)
+{
+ loff_t old_size;
+ int err;
+
+ dbg_gen("ino %lu, size %lld -> %lld",
+ inode->i_ino, inode->i_size, new_size);
+ ubifs_assert(mutex_is_locked(&inode->i_mutex));
+
+ old_size = inode->i_size;
+
+ err = vmtruncate(inode, new_size);
+ if (err)
+ return err;
+
+ if (!S_ISREG(inode->i_mode))
+ return 0;
+
+ if (new_size < old_size) {
+ struct ubifs_info *c = inode->i_sb->s_fs_info;
+ int offset = new_size & (UBIFS_BLOCK_SIZE - 1);
+
+ if (offset) {
+ pgoff_t index = new_size >> PAGE_CACHE_SHIFT;
+ struct page *page;
+
+ page = find_lock_page(inode->i_mapping, index);
+ if (page) {
+ if (PageDirty(page)) {
+ ubifs_assert(PageUptodate(page));
+ ubifs_assert(!PageWriteback(page));
+ ubifs_assert(PagePrivate(page));
+
+ clear_page_dirty_for_io(page);
+ err = do_writepage(page, offset);
+ if (err)
+ return err;
+ /*
+ * We could now tell ubifs_jrn_truncate
+ * not to read the last block.
+ */
+ } else {
+ /*
+ * We could 'kmap()' the page and
+ * pass the data to ubifs_jrn_truncate
+ * to save it from having to read it.
+ */
+ unlock_page(page);
+ page_cache_release(page);
+ }
+ }
+ }
+ err = ubifs_jrn_truncate(c, inode->i_ino, old_size, new_size);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+int ubifs_setattr(struct dentry *dentry, struct iattr *attr)
+{
+ unsigned int ia_valid = attr->ia_valid;
+ struct inode *inode = dentry->d_inode;
+ struct ubifs_info *c = inode->i_sb->s_fs_info;
+ struct ubifs_budget_req req;
+ int truncation, err = 0;
+
+ dbg_gen("ino %lu, ia_valid %#x", inode->i_ino, ia_valid);
+ ubifs_assert(mutex_is_locked(&inode->i_mutex));
+
+ err = inode_change_ok(inode, attr);
+ if (err)
+ return err;
+
+ memset(&req, 0, sizeof(struct ubifs_budget_req));
+
+ /*
+ * If this is truncation, and we do not truncate on a block boundary,
+ * budget for changing one data block, because the last block will be
+ * re-written.
+ */
+ truncation = (ia_valid & ATTR_SIZE) && attr->ia_size != inode->i_size;
+ if (truncation && (attr->ia_size & (UBIFS_BLOCK_SIZE - 1)))
+ req.dirtied_page = 1;
+
+ err = ubifs_budget_inode_op(c, inode, &req);
+ if (err)
+ return err;
+
+ if (truncation) {
+ err = ubifs_trunc(inode, attr->ia_size);
+ if (err) {
+ ubifs_cancel_ino_op(c, inode, &req);
+ return err;
+ }
+
+ inode->i_mtime = inode->i_ctime = CURRENT_TIME_SEC;
+ }
+
+ if (ia_valid & ATTR_UID)
+ inode->i_uid = attr->ia_uid;
+ if (ia_valid & ATTR_GID)
+ inode->i_gid = attr->ia_gid;
+ if (ia_valid & ATTR_ATIME)
+ inode->i_atime = timespec_trunc(attr->ia_atime,
+ inode->i_sb->s_time_gran);
+ if (ia_valid & ATTR_MTIME)
+ inode->i_mtime = timespec_trunc(attr->ia_mtime,
+ inode->i_sb->s_time_gran);
+ if (ia_valid & ATTR_CTIME)
+ inode->i_ctime = timespec_trunc(attr->ia_ctime,
+ inode->i_sb->s_time_gran);
+ if (ia_valid & ATTR_MODE) {
+ umode_t mode = attr->ia_mode;
+
+ if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
+ mode &= ~S_ISGID;
+ inode->i_mode = mode;
+ }
+
+ mark_inode_dirty_sync(inode);
+ ubifs_release_ino_dirty(c, inode, &req);
+
+ if (req.dirtied_page) {
+ /*
+ * Truncation code does not make the reenacted page dirty, it
+ * just changes it on journal level, so we have to release page
+ * change budget.
+ */
+ memset(&req, 0, sizeof(struct ubifs_budget_req));
+ req.dd_growth = c->page_budget;
+ ubifs_release_budget(c, &req);
+ }
+
+ if (IS_SYNC(inode))
+ err = write_inode_now(inode, 1);
+
+ return err;
+}
+
+static void ubifs_invalidatepage(struct page *page, unsigned long offset)
+{
+ struct inode *inode = page->mapping->host;
+ struct ubifs_info *c = inode->i_sb->s_fs_info;
+ struct ubifs_budget_req req;
+
+ ubifs_assert(PagePrivate(page));
+ if (offset)
+ /* Partial page remains dirty */
+ return;
+
+ memset(&req, 0, sizeof(struct ubifs_budget_req));
+ if (PageChecked(page)) {
+ req.new_page = 1;
+ req.idx_growth = -1;
+ req.data_growth = c->page_budget;
+ } else
+ req.dd_growth = c->page_budget;
+ ubifs_release_budget(c, &req);
+
+ atomic_long_dec(&c->dirty_pg_cnt);
+ ClearPagePrivate(page);
+ ClearPageChecked(page);
+}
+
+static void *ubifs_follow_link(struct dentry *dentry, struct nameidata *nd)
+{
+ struct ubifs_inode *ui = ubifs_inode(dentry->d_inode);
+
+ nd_set_link(nd, ui->data);
+ return NULL;
+}
+
+int ubifs_fsync(struct file *filp, struct dentry *dentry, int datasync)
+{
+ struct inode *inode = dentry->d_inode;
+ struct ubifs_info *c = inode->i_sb->s_fs_info;
+ int err;
+
+ dbg_gen("syncing inode %lu", inode->i_ino);
+ ubifs_assert(mutex_is_locked(&inode->i_mutex));
+
+ /* Synchronize the inode and dirty pages */
+ err = write_inode_now(inode, 1);
+ if (err)
+ return err;
+
+ /*
+ * Some data related to this inode may still sit in a write-buffer.
+ * Flush them.
+ */
+ err = ubifs_sync_wbufs_by_inodes(c, &inode, 1);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+/**
+ * update_ctime - update mtime and ctime of an inode.
+ * @c: UBIFS file-system description object
+ * @inode: inode to update
+ *
+ * Time resolution of UBIFS is one second. This function updates mtime and
+ * ctime of the inode if it is not equivalent to current time. Returns zero in
+ * case of success and a negative error code in case of failure.
+ */
+static int update_mctime(struct ubifs_info *c, struct inode *inode)
+{
+ time_t now = get_seconds();
+ struct ubifs_budget_req req;
+ int err;
+
+ if (inode->i_mtime.tv_sec != now || inode->i_ctime.tv_sec != now) {
+ memset(&req, 0, sizeof(struct ubifs_budget_req));
+ err = ubifs_budget_inode_op(c, inode, &req);
+ if (err)
+ return err;
+
+ inode->i_mtime.tv_sec = inode->i_ctime.tv_sec = now;
+ mark_inode_dirty_sync(inode);
+ mutex_unlock(&ubifs_inode(inode)->budg_mutex);
+ }
+
+ return 0;
+}
+
+static ssize_t ubifs_write(struct file *filp, const char __user *buf,
+ size_t len, loff_t *ppos)
+{
+ int err;
+ ssize_t ret;
+ struct inode *inode = filp->f_mapping->host;
+ struct ubifs_info *c = inode->i_sb->s_fs_info;
+
+ err = update_mctime(c, inode);
+ if (err)
+ return err;
+
+ ret = do_sync_write(filp, buf, len, ppos);
+ if (ret < 0)
+ return ret;
+
+ if (ret > 0 && IS_SYNC(inode)) {
+ err = ubifs_sync_wbufs_by_inodes(c, &inode, 1);
+ if (err)
+ return err;
+ }
+
+ return ret;
+}
+
+static ssize_t ubifs_aio_write(struct kiocb *iocb, const struct iovec *iov,
+ unsigned long nr_segs, loff_t pos)
+{
+ int err;
+ ssize_t ret;
+ struct inode *inode = iocb->ki_filp->f_mapping->host;
+ struct ubifs_info *c = inode->i_sb->s_fs_info;
+
+ err = update_mctime(c, inode);
+ if (err)
+ return err;
+
+ ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
+ if (ret < 0)
+ return ret;
+
+ if (ret > 0 && IS_SYNC(inode)) {
+ err = ubifs_sync_wbufs_by_inodes(c, &inode, 1);
+ if (err)
+ return err;
+ }
+
+ return ret;
+}
+
+static int ubifs_set_page_dirty(struct page *page)
+{
+ /*
+ * An attempt to dirty a page without budgeting for it - should not
+ * happen.
+ */
+ ubifs_assert(0);
+ return __set_page_dirty_nobuffers(page);
+}
+
+static int ubifs_releasepage(struct page *page, gfp_t unused_gfp_flags)
+{
+ /*
+ * An attempt to release a dirty page without budgeting for it - should
+ * not happen.
+ */
+ ubifs_assert(PageLocked(page));
+ if (PageWriteback(page))
+ return 0;
+ ubifs_assert(PagePrivate(page));
+ ubifs_assert(0);
+ ClearPagePrivate(page);
+ ClearPageChecked(page);
+ return 1;
+}
+
+struct address_space_operations ubifs_file_address_operations = {
+ .readpage = ubifs_readpage,
+ .writepage = ubifs_writepage,
+ .write_begin = ubifs_write_begin,
+ .write_end = ubifs_write_end,
+ .invalidatepage = ubifs_invalidatepage,
+ .set_page_dirty = ubifs_set_page_dirty,
+ .releasepage = ubifs_releasepage,
+};
+
+struct inode_operations ubifs_file_inode_operations = {
+ .setattr = ubifs_setattr,
+ .getattr = ubifs_getattr,
+#ifdef CONFIG_UBIFS_FS_XATTR
+ .setxattr = ubifs_setxattr,
+ .getxattr = ubifs_getxattr,
+ .listxattr = ubifs_listxattr,
+ .removexattr = ubifs_removexattr,
+#endif
+};
+
+struct inode_operations ubifs_symlink_inode_operations = {
+ .readlink = generic_readlink,
+ .follow_link = ubifs_follow_link,
+ .setattr = ubifs_setattr,
+ .getattr = ubifs_getattr,
+};
+
+struct file_operations ubifs_file_operations = {
+ .llseek = generic_file_llseek,
+ .read = do_sync_read,
+ .write = ubifs_write,
+ .aio_read = generic_file_aio_read,
+ .aio_write = ubifs_aio_write,
+ .mmap = generic_file_mmap,
+ .fsync = ubifs_fsync,
+ .ioctl = ubifs_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = ubifs_compat_ioctl,
+#endif
+};
diff --git a/fs/ubifs/ioctl.c b/fs/ubifs/ioctl.c
new file mode 100644
index 0000000..63c3b60
--- /dev/null
+++ b/fs/ubifs/ioctl.c
@@ -0,0 +1,205 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ * Copyright (C) 2006, 2007 University of Szeged, Hungary
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Zoltan Sogor
+ * Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/* This file implements EXT2-compatible extended attribute ioctl() calls */
+
+#include <linux/compat.h>
+#include <linux/smp_lock.h>
+#include "ubifs.h"
+
+/**
+ * ubifs_set_inode_flags - set VFS inode flags.
+ * @inode: VFS inode to set flags for
+ *
+ * This function propagates flags from UBIFS inode object to VFS inode object.
+ */
+void ubifs_set_inode_flags(struct inode *inode)
+{
+ unsigned int flags = ubifs_inode(inode)->flags;
+
+ inode->i_flags &= ~(S_SYNC | S_APPEND | S_IMMUTABLE | S_DIRSYNC);
+ if (flags & UBIFS_SYNC_FL)
+ inode->i_flags |= S_SYNC;
+ if (flags & UBIFS_APPEND_FL)
+ inode->i_flags |= S_APPEND;
+ if (flags & UBIFS_IMMUTABLE_FL)
+ inode->i_flags |= S_IMMUTABLE;
+ if (flags & UBIFS_DIRSYNC_FL)
+ inode->i_flags |= S_DIRSYNC;
+}
+
+/*
+ * ioctl2ubifs - convert ioctl inode flags to UBIFS inode flags.
+ * @ioctl_flags: flags to convert
+ *
+ * This function convert ioctl flags (@FS_COMPR_FL, etc) to UBIFS inode flags
+ * (@UBIFS_COMPR_FL, etc).
+ */
+static int ioctl2ubifs(int ioctl_flags)
+{
+ int ubifs_flags = 0;
+
+ if (ioctl_flags & FS_COMPR_FL)
+ ubifs_flags |= UBIFS_COMPR_FL;
+ if (ioctl_flags & FS_SYNC_FL)
+ ubifs_flags |= UBIFS_SYNC_FL;
+ if (ioctl_flags & FS_APPEND_FL)
+ ubifs_flags |= UBIFS_APPEND_FL;
+ if (ioctl_flags & FS_IMMUTABLE_FL)
+ ubifs_flags |= UBIFS_IMMUTABLE_FL;
+ if (ioctl_flags & FS_DIRSYNC_FL)
+ ubifs_flags |= UBIFS_DIRSYNC_FL;
+
+ return ubifs_flags;
+}
+
+/*
+ * ubifs2ioctl - convert UBIFS inode flags to ioctl inode flags.
+ * @ubifs_flags: flags to convert
+ *
+ * This function convert UBIFS (@UBIFS_COMPR_FL, etc) to ioctl flags
+ * (@FS_COMPR_FL, etc).
+ */
+static int ubifs2ioctl(int ubifs_flags)
+{
+ int ioctl_flags = 0;
+
+ if (ubifs_flags & UBIFS_COMPR_FL)
+ ioctl_flags |= FS_COMPR_FL;
+ if (ubifs_flags & UBIFS_SYNC_FL)
+ ioctl_flags |= FS_SYNC_FL;
+ if (ubifs_flags & UBIFS_APPEND_FL)
+ ioctl_flags |= FS_APPEND_FL;
+ if (ubifs_flags & UBIFS_IMMUTABLE_FL)
+ ioctl_flags |= FS_IMMUTABLE_FL;
+ if (ubifs_flags & UBIFS_DIRSYNC_FL)
+ ioctl_flags |= FS_DIRSYNC_FL;
+
+ return ioctl_flags;
+}
+
+static int setflags(struct inode *inode, int flags)
+{
+ struct ubifs_inode *ui = ubifs_inode(inode);
+ struct ubifs_info *c = inode->i_sb->s_fs_info;
+ struct ubifs_budget_req req;
+ int oldflags, err;
+
+ mutex_lock(&inode->i_mutex);
+
+ memset(&req, 0 , sizeof(struct ubifs_budget_req));
+ err = ubifs_budget_inode_op(c, inode, &req);
+ if (err)
+ goto out;
+
+ /*
+ * The IMMUTABLE and APPEND_ONLY flags can only be changed by
+ * the relevant capability.
+ */
+ oldflags = ubifs2ioctl(ui->flags);
+ if ((flags ^ oldflags) & (FS_APPEND_FL | FS_IMMUTABLE_FL)) {
+ if (!capable(CAP_LINUX_IMMUTABLE)) {
+ err = -EPERM;
+ goto out_budg;
+ }
+ }
+
+ ui->flags = ioctl2ubifs(flags);
+ ubifs_set_inode_flags(inode);
+
+ inode->i_ctime = CURRENT_TIME_SEC;
+ mark_inode_dirty_sync(inode);
+
+ ubifs_release_ino_dirty(c, inode, &req);
+
+ if (IS_SYNC(inode))
+ err = write_inode_now(inode, 1);
+
+ mutex_unlock(&inode->i_mutex);
+ return err;
+
+out_budg:
+ ubifs_cancel_ino_op(c, inode, &req);
+out:
+ ubifs_err("can't modify inode %lu attributes", inode->i_ino);
+ mutex_unlock(&inode->i_mutex);
+ return err;
+}
+
+int ubifs_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
+ unsigned long arg)
+{
+ int flags;
+
+ switch (cmd) {
+ case FS_IOC_GETFLAGS:
+ flags = ubifs2ioctl(ubifs_inode(inode)->flags);
+
+ return put_user(flags, (int __user *) arg);
+
+ case FS_IOC_SETFLAGS: {
+ if (IS_RDONLY(inode))
+ return -EROFS;
+
+ if (!is_owner_or_cap(inode))
+ return -EACCES;
+
+ if (get_user(flags, (int __user *) arg))
+ return -EFAULT;
+
+ if (!S_ISDIR(inode->i_mode))
+ flags &= ~FS_DIRSYNC_FL;
+
+ return setflags(inode, flags);
+ }
+
+ default:
+ return -ENOTTY;
+ }
+}
+
+#ifdef CONFIG_COMPAT
+long ubifs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ struct inode *inode = file->f_path.dentry->d_inode;
+ int err;
+
+ switch (cmd) {
+ case FS_IOC32_GETFLAGS:
+ cmd = FS_IOC_GETFLAGS;
+ break;
+ case FS_IOC32_SETFLAGS:
+ cmd = FS_IOC_SETFLAGS;
+ break;
+ default:
+ return -ENOIOCTLCMD;
+ }
+
+ lock_kernel();
+ err = ubifs_ioctl(inode, file, cmd, (unsigned long)compat_ptr(arg));
+ unlock_kernel();
+
+ return err;
+}
+#endif
--
1.5.4.1

2008-03-27 13:10:20

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 23/26] UBIFS: add orphans handling sub-system

This sub-system keeps track of orphans - the files which were deleted
but are still kept open. These files should be deleted only when the
last reference goes. But if an unclean reboot happens, UBIFS has to
also delete the orphans. This is why the orphans sub-system exists -
it records information about all orphans to the on-flash orphan area.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/orphan.c | 952 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 952 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c
new file mode 100644
index 0000000..4173fa9
--- /dev/null
+++ b/fs/ubifs/orphan.c
@@ -0,0 +1,952 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Author: Adrian Hunter
+ */
+
+#include "ubifs.h"
+
+/*
+ * An orphan is an inode number whose inode node has been committed to the index
+ * with a link count of zero. That happens when an open file is deleted
+ * (unlinked) and then a commit is run. In the normal course of events the inode
+ * would be deleted when the file is closed. However in the case of an unclean
+ * unmount, orphans need to be accounted for. After an unclean unmount, the
+ * orphans' inodes must be deleted which means either scanning the entire index
+ * looking for them, or keeping a list on flash somewhere. This unit implements
+ * the latter approach.
+ *
+ * The orphan area is a fixed number of LEBs situated between the LPT area and
+ * the main area. The number of orphan area LEBs is specified when the file
+ * system is created. The minimum number is 1. The size of the orphan area
+ * should be so that it can hold the maximum number of orphans that are expected
+ * to ever exist at one time.
+ *
+ * The number of orphans that can fit in a LEB is:
+ *
+ * (c->leb_size - UBIFS_ORPH_NODE_SZ) / sizeof(__le64)
+ *
+ * For example: a 15872 byte LEB can fit 1980 orphans so 1 LEB may be enough.
+ *
+ * Orphans are accumulated in a rb-tree. When an inode's link count drops to
+ * zero, the inode number is added to the rb-tree. It is removed from the tree
+ * when the inode is deleted. Any new orphans that are in the orphan tree when
+ * the commit is run, are written to the orphan area in 1 or more orph nodes.
+ * If the orphan area is full, it is consolidated to make space. There is
+ * always enough space because validation prevents the user from creating more
+ * than the maximum number of orphans allowed.
+ */
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_ORPH
+static int dbg_check_orphans(struct ubifs_info *c);
+#else
+#define dbg_check_orphans(c) 0
+#endif
+
+/**
+ * ubifs_add_orphan - add an orphan.
+ * @c: UBIFS file-system description object
+ * @inum: orphan inode number
+ *
+ * Add an orphan. This function is called when an inodes link count drops to
+ * zero.
+ */
+int ubifs_add_orphan(struct ubifs_info *c, ino_t inum)
+{
+ struct ubifs_orphan *orphan, *o;
+ struct rb_node **p, *parent = NULL;
+
+ orphan = kzalloc(sizeof(struct ubifs_orphan), GFP_NOFS);
+ if (!orphan)
+ return -ENOMEM;
+ orphan->inum = inum;
+ orphan->new = 1;
+
+ spin_lock(&c->orphan_lock);
+ if (c->tot_orphans >= c->max_orphans) {
+ spin_unlock(&c->orphan_lock);
+ kfree(orphan);
+ return -ENFILE;
+ }
+ p = &c->orph_tree.rb_node;
+ while (*p) {
+ parent = *p;
+ o = rb_entry(parent, struct ubifs_orphan, rb);
+ if (inum < o->inum)
+ p = &(*p)->rb_left;
+ else if (inum > o->inum)
+ p = &(*p)->rb_right;
+ else {
+ dbg_err("orphaned twice");
+ spin_unlock(&c->orphan_lock);
+ kfree(orphan);
+ return 0;
+ }
+ }
+ c->tot_orphans += 1;
+ c->new_orphans += 1;
+ rb_link_node(&orphan->rb, parent, p);
+ rb_insert_color(&orphan->rb, &c->orph_tree);
+ list_add_tail(&orphan->list, &c->orph_list);
+ list_add_tail(&orphan->new_list, &c->orph_new);
+ spin_unlock(&c->orphan_lock);
+ dbg_gen("ino %lu", inum);
+ return 0;
+}
+
+/**
+ * ubifs_delete_orphan - delete an orphan.
+ * @c: UBIFS file-system description object
+ * @inum: orphan inode number
+ *
+ * Delete an orphan. This function is called when an inode is deleted.
+ */
+void ubifs_delete_orphan(struct ubifs_info *c, ino_t inum)
+{
+ struct ubifs_orphan *o;
+ struct rb_node *p;
+
+ spin_lock(&c->orphan_lock);
+ p = c->orph_tree.rb_node;
+ while (p) {
+ o = rb_entry(p, struct ubifs_orphan, rb);
+ if (inum < o->inum)
+ p = p->rb_left;
+ else if (inum > o->inum)
+ p = p->rb_right;
+ else {
+ if (o->dnext) {
+ spin_unlock(&c->orphan_lock);
+ dbg_gen("deleted twice ino %lu", inum);
+ return;
+ }
+ if (o->cnext) {
+ o->dnext = c->orph_dnext;
+ c->orph_dnext = o;
+ spin_unlock(&c->orphan_lock);
+ dbg_gen("delete later ino %lu", inum);
+ return;
+ }
+ rb_erase(p, &c->orph_tree);
+ list_del(&o->list);
+ c->tot_orphans -= 1;
+ if (o->new) {
+ list_del(&o->new_list);
+ c->new_orphans -= 1;
+ }
+ spin_unlock(&c->orphan_lock);
+ kfree(o);
+ dbg_gen("inum %lu", inum);
+ return;
+ }
+ }
+ spin_unlock(&c->orphan_lock);
+ dbg_err("missing orphan ino %lu", inum);
+ dbg_dump_stack();
+}
+
+/**
+ * ubifs_orphan_start_commit - start commit of orphans.
+ * @c: UBIFS file-system description object
+ *
+ * Start commit of orphans.
+ */
+int ubifs_orphan_start_commit(struct ubifs_info *c)
+{
+ struct ubifs_orphan *orphan, **last;
+
+ spin_lock(&c->orphan_lock);
+ last = &c->orph_cnext;
+ list_for_each_entry(orphan, &c->orph_new, new_list) {
+ ubifs_assert(orphan->new);
+ orphan->new = 0;
+ *last = orphan;
+ last = &orphan->cnext;
+ }
+ *last = orphan->cnext;
+ c->cmt_orphans = c->new_orphans;
+ c->new_orphans = 0;
+ dbg_cmt("%d orphans to commit", c->cmt_orphans);
+ INIT_LIST_HEAD(&c->orph_new);
+ if (c->tot_orphans == 0)
+ c->no_orphs = 1;
+ else
+ c->no_orphs = 0;
+ spin_unlock(&c->orphan_lock);
+ return 0;
+}
+
+/**
+ * avail_orphs - calculate available space.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns the number of orphans that can be written in the
+ * available space.
+ */
+static int avail_orphs(struct ubifs_info *c)
+{
+ int avail_lebs, avail, gap;
+
+ avail_lebs = c->orph_lebs - (c->ohead_lnum - c->orph_first) - 1;
+ avail = avail_lebs *
+ ((c->leb_size - UBIFS_ORPH_NODE_SZ) / sizeof(__le64));
+ gap = c->leb_size - c->ohead_offs;
+ if (gap >= UBIFS_ORPH_NODE_SZ + sizeof(__le64))
+ avail += (gap - UBIFS_ORPH_NODE_SZ) / sizeof(__le64);
+ return avail;
+}
+
+/**
+ * tot_avail_orphs - calculate total space.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns the number of orphans that can be written in half
+ * the total space. That leaves half the space for adding new orphans.
+ */
+static int tot_avail_orphs(struct ubifs_info *c)
+{
+ int avail_lebs, avail;
+
+ avail_lebs = c->orph_lebs;
+ avail = avail_lebs *
+ ((c->leb_size - UBIFS_ORPH_NODE_SZ) / sizeof(__le64));
+ return avail / 2;
+}
+
+/**
+ * do_write_orph_node - write a node
+ * @c: UBIFS file-system description object
+ * @len: length of node
+ * @atomic: write atomically
+ *
+ * This function writes a node to the orphan head from the orphan buffer. If
+ * %atomic is not zero, then the write is done atomically. On success, %0 is
+ * returned, otherwise a negative error code is returned.
+ */
+static int do_write_orph_node(struct ubifs_info *c, int len, int atomic)
+{
+ int err = 0;
+
+ if (atomic) {
+ ubifs_assert(c->ohead_offs == 0);
+ ubifs_prepare_node(c, c->orph_buf, len, 1);
+ len = ALIGN(len, c->min_io_size);
+ err = ubi_leb_change(c->ubi, c->ohead_lnum, c->orph_buf, len,
+ UBI_SHORTTERM);
+ } else {
+ if (c->ohead_offs == 0) {
+ /* Ensure LEB has been unmapped */
+ err = ubifs_leb_unmap(c, c->ohead_lnum);
+ if (err)
+ return err;
+ }
+ err = ubifs_write_node(c, c->orph_buf, len, c->ohead_lnum,
+ c->ohead_offs, UBI_SHORTTERM);
+ }
+ return err;
+}
+
+/**
+ * write_orph_node - write an orph node
+ * @c: UBIFS file-system description object
+ * @atomic: write atomically
+ *
+ * This function builds an orph node from the cnext list and writes it to the
+ * orphan head. On success, %0 is returned, otherwise a negative error code
+ * is returned.
+ */
+static int write_orph_node(struct ubifs_info *c, int atomic)
+{
+ struct ubifs_orphan *orphan, *cnext;
+ struct ubifs_orph_node *orph;
+ int gap, err, len, cnt, i;
+
+ ubifs_assert(c->cmt_orphans > 0);
+ gap = c->leb_size - c->ohead_offs;
+ if (gap < UBIFS_ORPH_NODE_SZ + sizeof(__le64)) {
+ c->ohead_lnum += 1;
+ c->ohead_offs = 0;
+ gap = c->leb_size;
+ if (c->ohead_lnum > c->orph_last) {
+ /*
+ * We limit the number of orphans so that this should
+ * never happen.
+ */
+ ubifs_err("out of space in orphan area");
+ return -EINVAL;
+ }
+ }
+ cnt = (gap - UBIFS_ORPH_NODE_SZ) / sizeof(__le64);
+ if (cnt > c->cmt_orphans)
+ cnt = c->cmt_orphans;
+ len = UBIFS_ORPH_NODE_SZ + cnt * sizeof(__le64);
+ ubifs_assert(c->orph_buf != NULL);
+ orph = c->orph_buf;
+ orph->ch.node_type = UBIFS_ORPH_NODE;
+ spin_lock(&c->orphan_lock);
+ cnext = c->orph_cnext;
+ for (i = 0; i < cnt; i++) {
+ orphan = cnext;
+ orph->inos[i] = cpu_to_le64(orphan->inum);
+ cnext = orphan->cnext;
+ orphan->cnext = NULL;
+ }
+ c->orph_cnext = cnext;
+ c->cmt_orphans -= cnt;
+ spin_unlock(&c->orphan_lock);
+ if (c->cmt_orphans)
+ orph->cmt_no = cpu_to_le64(c->cmt_no + 1);
+ else
+ /* Mark the last node of the commit */
+ orph->cmt_no = cpu_to_le64((c->cmt_no + 1) | (1ULL << 63));
+ ubifs_assert(c->ohead_offs + len <= c->leb_size);
+ ubifs_assert(c->ohead_lnum >= c->orph_first);
+ ubifs_assert(c->ohead_lnum <= c->orph_last);
+ err = do_write_orph_node(c, len, atomic);
+ c->ohead_offs += ALIGN(len, c->min_io_size);
+ c->ohead_offs = ALIGN(c->ohead_offs, 8);
+ return err;
+}
+
+/**
+ * write_orph_nodes - write orph nodes until there are no more to commit
+ * @c: UBIFS file-system description object
+ * @atomic: write atomically
+ *
+ * This function writes orph nodes for all the orphans to commit. On success,
+ * %0 is returned, otherwise a negative error code is returned.
+ */
+static int write_orph_nodes(struct ubifs_info *c, int atomic)
+{
+ int err;
+
+ while (c->cmt_orphans > 0) {
+ err = write_orph_node(c, atomic);
+ if (err)
+ return err;
+ }
+ if (atomic) {
+ int lnum;
+
+ /* Unmap any unused LEBs after consolidation */
+ lnum = c->ohead_lnum + 1;
+ for (lnum = c->ohead_lnum + 1; lnum <= c->orph_last; lnum++) {
+ err = ubifs_leb_unmap(c, lnum);
+ if (err)
+ return err;
+ }
+ }
+ return 0;
+}
+
+/**
+ * consolidate - consolidate the orphan area.
+ * @c: UBIFS file-system description object
+ *
+ * This function enables consolidation by putting all the orphans into the list
+ * to commit. The list is in the order that the orphans were added, and the
+ * LEBs are written atomically in order, so at no time can orphans be lost by
+ * an unclean unmount.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int consolidate(struct ubifs_info *c)
+{
+ int tot_avail = tot_avail_orphs(c), err = 0;
+
+ spin_lock(&c->orphan_lock);
+ dbg_cmt("there is space for %d orphans and there are %d",
+ tot_avail, c->tot_orphans);
+ if (c->tot_orphans - c->new_orphans <= tot_avail) {
+ struct ubifs_orphan *orphan, **last;
+ int cnt = 0;
+
+ /* Change the cnext list to include all non-new orphans */
+ last = &c->orph_cnext;
+ list_for_each_entry(orphan, &c->orph_list, list) {
+ if (orphan->new)
+ continue;
+ *last = orphan;
+ last = &orphan->cnext;
+ cnt += 1;
+ }
+ *last = orphan->cnext;
+ ubifs_assert(cnt == c->tot_orphans - c->new_orphans);
+ c->cmt_orphans = cnt;
+ c->ohead_lnum = c->orph_first;
+ c->ohead_offs = 0;
+ } else {
+ /*
+ * We limit the number of orphans so that this should
+ * never happen.
+ */
+ ubifs_err("out of space in orphan area");
+ err = -EINVAL;
+ }
+ spin_unlock(&c->orphan_lock);
+ return err;
+}
+
+/**
+ * commit_orphans - commit orphans.
+ * @c: UBIFS file-system description object
+ *
+ * This function commits orphans to flash. On success, %0 is returned,
+ * otherwise a negative error code is returned.
+ */
+static int commit_orphans(struct ubifs_info *c)
+{
+ int avail, atomic = 0, err;
+
+ ubifs_assert(c->cmt_orphans > 0);
+ avail = avail_orphs(c);
+ if (avail < c->cmt_orphans) {
+ /* Not enough space to write new orphans, so consolidate */
+ err = consolidate(c);
+ if (err)
+ return err;
+ atomic = 1;
+ }
+ err = write_orph_nodes(c, atomic);
+ return err;
+}
+
+/**
+ * erase_deleted - erase the orphans marked for deletion.
+ * @c: UBIFS file-system description object
+ *
+ * During commit, the orphans being committed cannot be deleted, so they are
+ * marked for deletion and deleted by this function. Also, the recovery
+ * adds killed orphans to the deletion list, and therefore they are deleted
+ * here too.
+ */
+static void erase_deleted(struct ubifs_info *c)
+{
+ struct ubifs_orphan *orphan, *dnext;
+
+ spin_lock(&c->orphan_lock);
+ dnext = c->orph_dnext;
+ while (dnext) {
+ orphan = dnext;
+ dnext = orphan->dnext;
+ ubifs_assert(!orphan->new);
+ rb_erase(&orphan->rb, &c->orph_tree);
+ list_del(&orphan->list);
+ c->tot_orphans -= 1;
+ dbg_gen("deleting orphan ino %lu", orphan->inum);
+ kfree(orphan);
+ }
+ c->orph_dnext = NULL;
+ spin_unlock(&c->orphan_lock);
+}
+
+/**
+ * ubifs_orphan_end_commit - end commit of orphans.
+ * @c: UBIFS file-system description object
+ *
+ * End commit of orphans.
+ */
+int ubifs_orphan_end_commit(struct ubifs_info *c)
+{
+ int err;
+
+ if (c->cmt_orphans != 0) {
+ err = commit_orphans(c);
+ if (err)
+ return err;
+ }
+ erase_deleted(c);
+ err = dbg_check_orphans(c);
+ return err;
+}
+
+/**
+ * clear_orphans - erase all LEBs used for orphans.
+ * @c: UBIFS file-system description object
+ *
+ * If recovery is not required, then the orphans from the previous session
+ * are not needed. This function locates the LEBs used to record
+ * orphans, and un-maps them.
+ */
+static int clear_orphans(struct ubifs_info *c)
+{
+ int lnum, err;
+
+ for (lnum = c->orph_first; lnum <= c->orph_last; lnum++) {
+ err = ubifs_leb_unmap(c, lnum);
+ if (err)
+ return err;
+ }
+ c->ohead_lnum = c->orph_first;
+ c->ohead_offs = 0;
+ return 0;
+}
+
+/**
+ * insert_dead_orphan - insert an orphan.
+ * @c: UBIFS file-system description object
+ * @inum: orphan inode number
+ *
+ * This function is a helper to the 'delete-orphans' function. The orphan must
+ * be kept until the next commit, so it is added to the rb-tree and the
+ * deletion list.
+ */
+static int insert_dead_orphan(struct ubifs_info *c, ino_t inum)
+{
+ struct ubifs_orphan *orphan, *o;
+ struct rb_node **p, *parent = NULL;
+
+ orphan = kzalloc(sizeof(struct ubifs_orphan), GFP_KERNEL);
+ if (!orphan)
+ return -ENOMEM;
+ orphan->inum = inum;
+
+ p = &c->orph_tree.rb_node;
+ while (*p) {
+ parent = *p;
+ o = rb_entry(parent, struct ubifs_orphan, rb);
+ if (inum < o->inum)
+ p = &(*p)->rb_left;
+ else if (inum > o->inum)
+ p = &(*p)->rb_right;
+ else {
+ /* Already added - no problem */
+ kfree(orphan);
+ return 0;
+ }
+ }
+ c->tot_orphans += 1;
+ rb_link_node(&orphan->rb, parent, p);
+ rb_insert_color(&orphan->rb, &c->orph_tree);
+ list_add_tail(&orphan->list, &c->orph_list);
+ orphan->dnext = c->orph_dnext;
+ c->orph_dnext = orphan;
+ dbg_mnt("ino %lu, new %d, tot %d",
+ inum, c->new_orphans, c->tot_orphans);
+ return 0;
+}
+
+/**
+ * do_kill_orphans - remove orphan inodes from the index.
+ * @c: UBIFS file-system description object
+ * @sleb: scanned LEB
+ * @last_cmt_no: cmt_no of last orph node read is passed and returned here
+ * @outofdate: whether the LEB is out of date is returned here
+ * @last_flagged: whether the end orph node is encountered
+ *
+ * This function is a helper to the 'kill_orphans' function. It goes through
+ * every orphan node in a LEB and for every inode number recorded, removes
+ * all keys for that inode from the TNC.
+ */
+static int do_kill_orphans(struct ubifs_info *c, struct ubifs_scan_leb *sleb,
+ unsigned long long *last_cmt_no, int *outofdate,
+ int *last_flagged)
+{
+ struct ubifs_scan_node *snod;
+ struct ubifs_orph_node *orph;
+ unsigned long long cmt_no;
+ ino_t inum;
+ int i, n, err, first = 1;
+
+ list_for_each_entry(snod, &sleb->nodes, list) {
+ if (snod->type != UBIFS_ORPH_NODE) {
+ ubifs_err("invalid node type %d in orphan area at "
+ "%d:%d", snod->type, sleb->lnum, snod->offs);
+ dbg_dump_node(c, snod->node);
+ return -EINVAL;
+ }
+
+ orph = snod->node;
+
+ /* Check commit number */
+ cmt_no = le64_to_cpu(orph->cmt_no) & LLONG_MAX;
+ /*
+ * The commit number on the master node may be less, because
+ * of a failed commit. If there are several failed commits in a
+ * row, the commit number written on orph nodes will continue to
+ * increase (because the commit number is adjusted here) even
+ * though the commit number on the master node stays the same
+ * because the master node has not been re-written.
+ */
+ if (cmt_no > c->cmt_no)
+ c->cmt_no = cmt_no;
+ if (cmt_no < *last_cmt_no && *last_flagged) {
+ /*
+ * The last orph node had a higher commit number and was
+ * flagged as the last written for that commit number.
+ * That makes this orph node, out of date.
+ */
+ if (!first) {
+ ubifs_err("out of order commit number %llu in "
+ "orphan node at %d:%d",
+ cmt_no, sleb->lnum, snod->offs);
+ dbg_dump_node(c, snod->node);
+ return -EINVAL;
+ }
+ dbg_mnt("out of date LEB %d", sleb->lnum);
+ *outofdate = 1;
+ return 0;
+ }
+
+ if (first)
+ first = 0;
+
+ n = (le32_to_cpu(orph->ch.len) - UBIFS_ORPH_NODE_SZ) >> 3;
+ for (i = 0; i < n; i++) {
+ inum = le64_to_cpu(orph->inos[i]);
+ dbg_mnt("deleting orphaned inode %lu", inum);
+ err = ubifs_tnc_remove_ino(c, inum);
+ if (err)
+ return err;
+ err = insert_dead_orphan(c, inum);
+ if (err)
+ return err;
+ }
+
+ *last_cmt_no = cmt_no;
+ if (le64_to_cpu(orph->cmt_no) & (1ULL << 63)) {
+ dbg_mnt("last orph node for commit %llu at %d:%d",
+ cmt_no, sleb->lnum, snod->offs);
+ *last_flagged = 1;
+ } else
+ *last_flagged = 0;
+ }
+
+ return 0;
+}
+
+/**
+ * kill_orphans - remove all orphan inodes from the index.
+ * @c: UBIFS file-system description object
+ *
+ * If recovery is required, then orphan inodes recorded during the previous
+ * session (which ended with an unclean unmount) must be deleted from the index.
+ * This is done by updating the TNC, but since the index is not updated until
+ * the next commit, the LEBs where the orphan information is recorded are not
+ * erased until the next commit.
+ */
+static int kill_orphans(struct ubifs_info *c)
+{
+ unsigned long long last_cmt_no = 0;
+ int lnum, err = 0, outofdate = 0, last_flagged = 0;
+
+ c->ohead_lnum = c->orph_first;
+ c->ohead_offs = 0;
+ /* Check no-orphans flag and skip this if no orphans */
+ if (c->no_orphs) {
+ dbg_mnt("no orphans");
+ return 0;
+ }
+ /*
+ * Orph nodes always start at c->orph_first and are written to each
+ * successive LEB in turn. Generally unused LEBs will have been unmapped
+ * but may contain out of date orph nodes if the unmap didn't go
+ * through. In addition, the last orph node written for each commit is
+ * marked (top bit of orph->cmt_no is set to 1). It is possible that
+ * there are orph nodes from the next commit (i.e. the commit did not
+ * complete successfully). In that case, no orphans will have been lost
+ * due to the way that orphans are written, and any orphans added will
+ * be valid orphans anyway and so can be deleted.
+ */
+ for (lnum = c->orph_first; lnum <= c->orph_last; lnum++) {
+ struct ubifs_scan_leb *sleb;
+
+ dbg_mnt("LEB %d", lnum);
+ sleb = ubifs_scan(c, lnum, 0, c->sbuf);
+ if (IS_ERR(sleb)) {
+ sleb = ubifs_recover_leb(c, lnum, 0, c->sbuf, 0);
+ if (IS_ERR(sleb)) {
+ err = PTR_ERR(sleb);
+ break;
+ }
+ }
+ err = do_kill_orphans(c, sleb, &last_cmt_no, &outofdate,
+ &last_flagged);
+ if (err || outofdate) {
+ ubifs_scan_destroy(sleb);
+ break;
+ }
+ if (sleb->endpt) {
+ c->ohead_lnum = lnum;
+ c->ohead_offs = sleb->endpt;
+ }
+ ubifs_scan_destroy(sleb);
+ }
+ return err;
+}
+
+/**
+ * ubifs_mount_orphans - delete orphan inodes and erase LEBs that recorded them.
+ * @c: UBIFS file-system description object
+ * @unclean: %1 => recover from unclean unmount
+ *
+ * This function is called when mounting to erase orphans from the previous
+ * session. If UBIFS was not unmounted cleanly, then the inodes recorded as
+ * orphans are deleted.
+ */
+int ubifs_mount_orphans(struct ubifs_info *c, int unclean)
+{
+ int err = 0;
+
+ c->max_orphans = tot_avail_orphs(c);
+
+ c->orph_buf = vmalloc(c->leb_size);
+ if (!c->orph_buf)
+ return -ENOMEM;
+
+ if (unclean)
+ err = kill_orphans(c);
+ else
+ err = clear_orphans(c);
+
+ return err;
+}
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_ORPH
+
+struct check_orphan {
+ struct rb_node rb;
+ ino_t inum;
+};
+
+struct check_info {
+ unsigned long last_ino;
+ unsigned long tot_inos;
+ unsigned long missing;
+ unsigned long long leaf_cnt;
+ struct ubifs_ino_node *node;
+ struct rb_root root;
+};
+
+static int dbg_find_orphan(struct ubifs_info *c, ino_t inum)
+{
+ struct ubifs_orphan *o;
+ struct rb_node *p;
+
+ spin_lock(&c->orphan_lock);
+ p = c->orph_tree.rb_node;
+ while (p) {
+ o = rb_entry(p, struct ubifs_orphan, rb);
+ if (inum < o->inum)
+ p = p->rb_left;
+ else if (inum > o->inum)
+ p = p->rb_right;
+ else {
+ spin_unlock(&c->orphan_lock);
+ return 1;
+ }
+ }
+ spin_unlock(&c->orphan_lock);
+ return 0;
+}
+
+static int dbg_ins_check_orphan(struct rb_root *root, ino_t inum)
+{
+ struct check_orphan *orphan, *o;
+ struct rb_node **p, *parent = NULL;
+
+ orphan = kzalloc(sizeof(struct check_orphan), GFP_NOFS);
+ if (!orphan)
+ return -ENOMEM;
+ orphan->inum = inum;
+
+ p = &root->rb_node;
+ while (*p) {
+ parent = *p;
+ o = rb_entry(parent, struct check_orphan, rb);
+ if (inum < o->inum)
+ p = &(*p)->rb_left;
+ else if (inum > o->inum)
+ p = &(*p)->rb_right;
+ else {
+ kfree(orphan);
+ return 0;
+ }
+ }
+ rb_link_node(&orphan->rb, parent, p);
+ rb_insert_color(&orphan->rb, root);
+ return 0;
+}
+
+static int dbg_find_check_orphan(struct rb_root *root, ino_t inum)
+{
+ struct check_orphan *o;
+ struct rb_node *p;
+
+ p = root->rb_node;
+ while (p) {
+ o = rb_entry(p, struct check_orphan, rb);
+ if (inum < o->inum)
+ p = p->rb_left;
+ else if (inum > o->inum)
+ p = p->rb_right;
+ else
+ return 1;
+ }
+ return 0;
+}
+
+static void dbg_free_check_tree(struct rb_root *root)
+{
+ struct rb_node *this = root->rb_node;
+ struct check_orphan *o;
+
+ while (this) {
+ if (this->rb_left) {
+ this = this->rb_left;
+ continue;
+ } else if (this->rb_right) {
+ this = this->rb_right;
+ continue;
+ }
+ o = rb_entry(this, struct check_orphan, rb);
+ this = rb_parent(this);
+ if (this) {
+ if (this->rb_left == &o->rb)
+ this->rb_left = NULL;
+ else
+ this->rb_right = NULL;
+ }
+ kfree(o);
+ }
+}
+
+static int dbg_orphan_check(struct ubifs_info *c, struct ubifs_zbranch *zbr,
+ void *priv)
+{
+ struct check_info *ci = priv;
+ ino_t inum;
+ int err;
+
+ inum = key_ino(c, &zbr->key);
+ if (inum != ci->last_ino) {
+ /* Lowest node type is the inode node, so it comes first */
+ if (key_type(c, &zbr->key) != UBIFS_INO_KEY)
+ ubifs_err("found orphan node ino %lu, type %d", inum,
+ key_type(c, &zbr->key));
+ ci->last_ino = inum;
+ ci->tot_inos += 1;
+ err = dbg_read_leaf_nolock(c, zbr, ci->node);
+ if (err) {
+ ubifs_err("node read failed, error %d", err);
+ return err;
+ }
+ if (ci->node->nlink == 0)
+ /* Must be recorded as an orphan */
+ if (!dbg_find_check_orphan(&ci->root, inum) &&
+ !dbg_find_orphan(c, inum)) {
+ ubifs_err("missing orphan, ino %lu", inum);
+ ci->missing += 1;
+ }
+ }
+ ci->leaf_cnt += 1;
+ return 0;
+}
+
+static int dbg_read_orphans(struct check_info *ci, struct ubifs_scan_leb *sleb)
+{
+ struct ubifs_scan_node *snod;
+ struct ubifs_orph_node *orph;
+ ino_t inum;
+ int i, n, err;
+
+ list_for_each_entry(snod, &sleb->nodes, list) {
+ cond_resched();
+ if (snod->type != UBIFS_ORPH_NODE)
+ continue;
+ orph = snod->node;
+ n = (le32_to_cpu(orph->ch.len) - UBIFS_ORPH_NODE_SZ) >> 3;
+ for (i = 0; i < n; i++) {
+ inum = le64_to_cpu(orph->inos[i]);
+ err = dbg_ins_check_orphan(&ci->root, inum);
+ if (err)
+ return err;
+ }
+ }
+ return 0;
+}
+
+static int dbg_scan_orphans(struct ubifs_info *c, struct check_info *ci)
+{
+ int lnum, err = 0;
+
+ /* Check no-orphans flag and skip this if no orphans */
+ if (c->no_orphs)
+ return 0;
+
+ for (lnum = c->orph_first; lnum <= c->orph_last; lnum++) {
+ struct ubifs_scan_leb *sleb;
+
+ sleb = ubifs_scan(c, lnum, 0, c->dbg_buf);
+ if (IS_ERR(sleb)) {
+ err = PTR_ERR(sleb);
+ break;
+ }
+
+ err = dbg_read_orphans(ci, sleb);
+ ubifs_scan_destroy(sleb);
+ if (err)
+ break;
+ }
+
+ return err;
+}
+
+static int dbg_check_orphans(struct ubifs_info *c)
+{
+ struct check_info ci;
+ int err;
+
+ ci.last_ino = 0;
+ ci.tot_inos = 0;
+ ci.missing = 0;
+ ci.leaf_cnt = 0;
+ ci.root = RB_ROOT;
+ ci.node = kmalloc(UBIFS_MAX_INO_NODE_SZ, GFP_NOFS);
+ if (!ci.node) {
+ ubifs_err("out of memory");
+ return -ENOMEM;
+ }
+
+ err = dbg_scan_orphans(c, &ci);
+ if (err)
+ goto out;
+
+ err = dbg_walk_index(c, &dbg_orphan_check, NULL, &ci);
+ if (err) {
+ ubifs_err("cannot scan TNC, error %d", err);
+ goto out;
+ }
+
+ if (ci.missing) {
+ ubifs_err("%lu missing orphan(s)", ci.missing);
+ err = -EINVAL;
+ goto out;
+ }
+
+ dbg_cmt("last inode number is %lu", ci.last_ino);
+ dbg_cmt("total number of inodes is %lu", ci.tot_inos);
+ dbg_cmt("total number of leaf nodes is %llu", ci.leaf_cnt);
+
+out:
+ dbg_free_check_tree(&ci.root);
+ kfree(ci.node);
+ return err;
+}
+
+#endif /* CONFIG_UBIFS_FS_DEBUG_CHK_ORPH */
--
1.5.4.1

2008-03-27 13:10:44

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 24/26] UBIFS: add header files

ubifs.h contains the internal stuff. ubifs-media.h contains the
on-flash format definition and might be copied to user-space
if needed. misc.h contains various inline helpers.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/misc.h | 267 +++++++++
fs/ubifs/ubifs-media.h | 701 ++++++++++++++++++++++
fs/ubifs/ubifs.h | 1519 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 2487 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/misc.h b/fs/ubifs/misc.h
new file mode 100644
index 0000000..0feadba
--- /dev/null
+++ b/fs/ubifs/misc.h
@@ -0,0 +1,267 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file contains miscellaneous helper functions.
+ */
+
+#ifndef __UBIFS_MISC_H__
+#define __UBIFS_MISC_H__
+
+/**
+ * ubifs_zn_dirty - check if znode is dirty.
+ * @znode: znode to check
+ *
+ * This helper function returns %1 if @znode is dirty and %0 otherwise.
+ */
+static inline int ubifs_zn_dirty(const struct ubifs_znode *znode)
+{
+ return !!test_bit(DIRTY_ZNODE, &znode->flags);
+}
+
+/**
+ * ubifs_wake_up_bgt - wake up background thread.
+ * @c: UBIFS file-system description object
+ */
+static inline void ubifs_wake_up_bgt(struct ubifs_info *c)
+{
+ if (c->bgt && !c->need_bgt) {
+ c->need_bgt = 1;
+ wake_up_process(c->bgt);
+ }
+}
+
+/**
+ * ubifs_tnc_find_child - find next child in znode.
+ * @znode: znode to search at
+ * @start: the zbranch index to start at
+ *
+ * This helper function looks for znode child starting at index @start. Returns
+ * the child or %NULL if no children were found.
+ */
+static inline struct ubifs_znode *
+ubifs_tnc_find_child(struct ubifs_znode *znode, int start)
+{
+ while (start < znode->child_cnt) {
+ if (znode->zbranch[start].znode)
+ return znode->zbranch[start].znode;
+ start += 1;
+ }
+
+ return NULL;
+}
+
+/**
+ * ubifs_inode - get UBIFS inode information by VFS 'struct inode' object.
+ * @inode: the VFS 'struct inode' pointer
+ */
+static inline struct ubifs_inode *ubifs_inode(const struct inode *inode)
+{
+ return container_of(inode, struct ubifs_inode, vfs_inode);
+}
+
+/**
+ * ubifs_ro_mode - switch UBIFS to read read-only mode.
+ * @c: UBIFS file-system description object
+ */
+static inline void ubifs_ro_mode(struct ubifs_info *c)
+{
+ if (!c->ro_media) {
+ c->ro_media = 1;
+ ubifs_warn("switched to read-only mode");
+ dbg_dump_stack();
+ }
+}
+
+/**
+ * ubifs_compr_present - check if compressor was compiled in.
+ * @compr_type: compressor type to check
+ *
+ * This function returns %1 of compressor of type @compr_type is present, and
+ * %0 if not.
+ */
+static inline int ubifs_compr_present(int compr_type)
+{
+ ubifs_assert(compr_type >= 0 && compr_type < UBIFS_COMPR_TYPES_CNT);
+ return !!ubifs_compressors[compr_type]->capi_name;
+}
+
+/**
+ * ubifs_compr_name - get compressor name string by its type.
+ * @compr_type: compressor type
+ *
+ * This function returns compressor type string.
+ */
+static inline const char *ubifs_compr_name(int compr_type)
+{
+ ubifs_assert(compr_type >= 0 && compr_type < UBIFS_COMPR_TYPES_CNT);
+ return ubifs_compressors[compr_type]->name;
+}
+
+/**
+ * ubifs_wbuf_sync - synchronize write-buffer.
+ * @wbuf: write-buffer to synchronize
+ *
+ * This is the same as as 'ubifs_wbuf_sync_nolock()' but it does not assume
+ * that the write-buffer is already locked.
+ */
+static inline int ubifs_wbuf_sync(struct ubifs_wbuf *wbuf)
+{
+ int err;
+
+ mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);
+ err = ubifs_wbuf_sync_nolock(wbuf);
+ mutex_unlock(&wbuf->io_mutex);
+ return err;
+}
+
+/**
+ * ubifs_leb_unmap - unmap an LEB.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number to unmap
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static inline int ubifs_leb_unmap(const struct ubifs_info *c, int lnum)
+{
+ int err;
+
+ err = ubi_leb_unmap(c->ubi, lnum);
+ if (err) {
+ ubifs_err("unmap LEB %d failed, error %d", lnum, err);
+ return err;
+ }
+
+ return 0;
+}
+
+/**
+ * ubifs_leb_write - write to a LEB.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number to write
+ * @buf: buffer to write from
+ * @offs: offset within LEB to write to
+ * @len: length to write
+ * @dtype: data type
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static inline int ubifs_leb_write(const struct ubifs_info *c, int lnum,
+ const void *buf, int offs, int len, int dtype)
+{
+ int err;
+
+ err = ubi_leb_write(c->ubi, lnum, buf, offs, len, dtype);
+ if (err) {
+ ubifs_err("writing %d bytes at %d:%d, error %d",
+ len, lnum, offs, err);
+ return err;
+ }
+
+ return 0;
+}
+
+/**
+ * ubifs_encode_dev - encode device node IDs.
+ * @dev: UBIFS device node information
+ * @rdev: device IDs to encode
+ *
+ * This is a helper function which encodes major/minor numbers of a device node
+ * into UBIFS device node description. We use standard Linux "new" and "huge"
+ * encodings.
+ */
+static inline int ubifs_encode_dev(union ubifs_dev_desc *dev, dev_t rdev)
+{
+ if (new_valid_dev(rdev)) {
+ dev->new = cpu_to_le32(new_encode_dev(rdev));
+ return sizeof(dev->new);
+ } else {
+ dev->huge = cpu_to_le64(huge_encode_dev(rdev));
+ return sizeof(dev->huge);
+ }
+}
+
+/**
+ * ubifs_add_dirt - add dirty space to LEB properties.
+ * @c: the UBIFS file-system description object
+ * @lnum: LEB to add dirty space for
+ * @dirty: dirty space to add
+ *
+ * This is a helper function which increased amount of dirty LEB space. Returns
+ * zero in case of success and a negative error code in case of failure.
+ */
+static inline int ubifs_add_dirt(struct ubifs_info *c, int lnum, int dirty)
+{
+ return ubifs_update_one_lp(c, lnum, -1, dirty, 0, 0);
+}
+
+/**
+ * ubifs_return_leb - return LEB to lprops.
+ * @c: the UBIFS file-system description object
+ * @lnum: LEB to return
+ *
+ * This helper function cleans the "taken" flag of a logical eraseblock in the
+ * lprops. Returns zero in case of success and a negative error code in case of
+ * failure.
+ */
+static inline int ubifs_return_leb(struct ubifs_info *c, int lnum)
+{
+ return ubifs_change_one_lp(c, lnum, -1, -1, 0, LPROPS_TAKEN, 0);
+}
+
+/**
+ * ubifs_idx_node_sz - return index node size.
+ * @c: the UBIFS file-system description object
+ * @child_cnt: number of children of this index node
+ */
+static inline int ubifs_idx_node_sz(const struct ubifs_info *c, int child_cnt)
+{
+ return UBIFS_IDX_NODE_SZ + (UBIFS_BRANCH_SZ + c->key_len) * child_cnt;
+}
+
+/**
+ * ubifs_idx_branch - return pointer to an index branch.
+ * @c: the UBIFS file-system description object
+ * @idx: index node
+ * @bnum: branch number
+ */
+static inline
+struct ubifs_branch *ubifs_idx_branch(const struct ubifs_info *c,
+ const struct ubifs_idx_node *idx,
+ int bnum)
+{
+ return (struct ubifs_branch *)((void *)idx->branches +
+ (UBIFS_BRANCH_SZ + c->key_len) * bnum);
+}
+
+/**
+ * ubifs_idx_key - return pointer to an index key.
+ * @c: the UBIFS file-system description object
+ * @idx: index node
+ */
+static inline void *ubifs_idx_key(const struct ubifs_info *c,
+ const struct ubifs_idx_node *idx)
+{
+ return (void *)((struct ubifs_branch *)idx->branches)->key;
+}
+
+#endif /* __UBIFS_MISC_H__ */
diff --git a/fs/ubifs/ubifs-media.h b/fs/ubifs/ubifs-media.h
new file mode 100644
index 0000000..714a176
--- /dev/null
+++ b/fs/ubifs/ubifs-media.h
@@ -0,0 +1,701 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file describes UBIFS on-flash format and contains definitions of all the
+ * relevant data structures and constants.
+ *
+ * All UBIFS on-flash objects are stored in the form of nodes. All nodes start
+ * with the UBIFS node magic number and have the same common header. Nodes
+ * always sit at 8-byte aligned positions on the media and node header sizes are
+ * also 8-byte aligned (except of the padding node).
+ */
+
+#ifndef __UBIFS_MEDIA_H__
+#define __UBIFS_MEDIA_H__
+
+/* UBIFS node magic number (must not have the padding byte first or last) */
+#define UBIFS_NODE_MAGIC 0x06101831
+
+/* UBIFS on-flash format version */
+#define UBIFS_FORMAT_VERSION 1
+
+/* Minimum logical eraseblock size in bytes */
+#define UBIFS_MIN_LEB_SZ (15*1024)
+
+/* Initial CRC32 value used when calculating CRC checksums */
+#define UBIFS_CRC32_INIT 0xFFFFFFFFU
+
+/* Root inode number */
+#define UBIFS_ROOT_INO 1
+
+/* Lowest inode number used for regular inodes (not UBIFS-only internal ones) */
+#define UBIFS_FIRST_INO 64
+
+/*
+ * Maximum file name and extended attribute length (must be a multiple of 8,
+ * minus 1).
+ */
+#define UBIFS_MAX_NLEN 255
+
+/* Maximum number of data journal heads */
+#define UBIFS_MAX_JHEADS 1
+
+/*
+ * Size of UBIFS data block. Note, UBIFS is not a block oriented file-system,
+ * which means that it does not treat the underlying media as consisting of
+ * blocks like in case of hard drives. Do not be confused. UBIFS block is just
+ * the maximum amount of data which one data node can have or which can be
+ * attached to an inode node.
+ */
+#define UBIFS_BLOCK_SIZE 4096
+#define UBIFS_BLOCK_SHIFT 12
+#define UBIFS_BLOCK_MASK 0x00000FFF
+
+/* UBIFS padding byte pattern (must not be first or last byte of node magic) */
+#define UBIFS_PADDING_BYTE 0xCE
+
+/* Maximum possible key length */
+#define UBIFS_MAX_KEY_LEN 16
+
+/* Key length ("simple" format) */
+#define UBIFS_SK_LEN 8
+
+/* Minimum index tree fanout */
+#define UBIFS_MIN_FANOUT 2
+
+/* Maximum number of levels in UBIFS indexing B-tree */
+#define UBIFS_MAX_LEVELS 512
+
+/* Maximum possible inode size in bytes */
+#define UBIFS_MAX_INODE_SZ 0x3FFFFFFFFFFFFFFFULL /* 62 bits */
+
+/* Maximum amount of data attached to an inode in bytes */
+#define UBIFS_MAX_INO_DATA 4096
+
+/* LEB Properties Tree fanout (must be power of 2) and fanout shift */
+#define UBIFS_LPT_FANOUT 4
+#define UBIFS_LPT_FANOUT_SHIFT 2
+
+/* LEB Properties Tree bit field sizes */
+#define UBIFS_LPT_CRC_BITS 16
+#define UBIFS_LPT_CRC_BYTES 2
+#define UBIFS_LPT_TYPE_BITS 4
+
+/*
+ * LEB Properties Tree node types.
+ *
+ * UBIFS_LPT_PNODE: LPT leaf node (contains LEB properties)
+ * UBIFS_LPT_NNODE: LPT internal node
+ * UBIFS_LPT_LTAB: LPT's own lprops table
+ * UBIFS_LPT_LSAVE: LPT's save table (big model only)
+ * UBIFS_LPT_NODE_CNT: count of LPT node types
+ * UBIFS_LPT_NOT_A_NODE: all ones (15 for 4 bits) is never a valid node type
+ */
+enum {
+ UBIFS_LPT_PNODE,
+ UBIFS_LPT_NNODE,
+ UBIFS_LPT_LTAB,
+ UBIFS_LPT_LSAVE,
+ UBIFS_LPT_NODE_CNT,
+ UBIFS_LPT_NOT_A_NODE = (1 << UBIFS_LPT_TYPE_BITS) - 1,
+};
+
+/*
+ * UBIFS inode types.
+ *
+ * UBIFS_ITYPE_REG: regular file
+ * UBIFS_ITYPE_DIR: directory
+ * UBIFS_ITYPE_LNK: soft link
+ * UBIFS_ITYPE_BLK: block device node
+ * UBIFS_ITYPE_CHR: character device node
+ * UBIFS_ITYPE_FIFO: fifo
+ * UBIFS_ITYPE_SOCK: socket
+ * UBIFS_ITYPES_CNT: count of supported file types
+ */
+enum {
+ UBIFS_ITYPE_REG,
+ UBIFS_ITYPE_DIR,
+ UBIFS_ITYPE_LNK,
+ UBIFS_ITYPE_BLK,
+ UBIFS_ITYPE_CHR,
+ UBIFS_ITYPE_FIFO,
+ UBIFS_ITYPE_SOCK,
+ UBIFS_ITYPES_CNT,
+};
+
+/*
+ * Supported key hash functions.
+ *
+ * UBIFS_KEY_HASH_R5: R5 hash
+ * UBIFS_KEY_HASH_TEST: test hash which just returns first 4 bytes of the name
+ */
+enum {
+ UBIFS_KEY_HASH_R5,
+ UBIFS_KEY_HASH_TEST,
+};
+
+/*
+ * Supported key formats.
+ *
+ * UBIFS_SIMPLE_KEY_FMT: simple key format
+ */
+enum {
+ UBIFS_SIMPLE_KEY_FMT,
+};
+
+/*
+ * Key types.
+ *
+ * UBIFS_INO_KEY: inode node key
+ * UBIFS_DATA_KEY: data node key
+ * UBIFS_DENT_KEY: directory entry node key
+ * UBIFS_XENT_KEY: extended attribute entry key
+ * UBIFS_TRUN_KEY: truncation node key
+ * UBIFS_KEY_TYPES_CNT: number of supported key types
+ */
+enum {
+ UBIFS_INO_KEY,
+ UBIFS_DATA_KEY,
+ UBIFS_DENT_KEY,
+ UBIFS_XENT_KEY,
+ UBIFS_TRUN_KEY,
+ UBIFS_KEY_TYPES_CNT,
+};
+
+/* Count of LEBs reserved for the superblock area */
+#define UBIFS_SB_LEBS 1
+/* Count of LEBs reserved for the master area */
+#define UBIFS_MST_LEBS 2
+
+/* First LEB of the superblock area */
+#define UBIFS_SB_LNUM 0
+/* First LEB of the master area */
+#define UBIFS_MST_LNUM (UBIFS_SB_LNUM + UBIFS_SB_LEBS)
+/* First LEB of the log area */
+#define UBIFS_LOG_LNUM (UBIFS_MST_LNUM + UBIFS_MST_LEBS)
+
+/* Minimum number of logical eraseblocks in the log */
+#define UBIFS_MIN_LOG_LEBS 2
+/* Minimum number of bud logical eraseblocks */
+#define UBIFS_MIN_BUD_LEBS 2
+/* Minimum number of journal logical eraseblocks */
+#define UBIFS_MIN_JRN_LEBS (UBIFS_MIN_LOG_LEBS + UBIFS_MIN_BUD_LEBS)
+/* Minimum number of LPT area logical eraseblocks */
+#define UBIFS_MIN_LPT_LEBS 2
+/* Minimum number of orphan area logical eraseblocks */
+#define UBIFS_MIN_ORPH_LEBS 1
+/* Minimum number of main area logical eraseblocks */
+#define UBIFS_MIN_MAIN_LEBS 8
+
+/* Minimum number of logical eraseblocks */
+#define UBIFS_MIN_LEB_CNT (UBIFS_SB_LEBS + UBIFS_MST_LEBS + \
+ UBIFS_MIN_LOG_LEBS + UBIFS_MIN_BUD_LEBS + \
+ UBIFS_MIN_LPT_LEBS + UBIFS_MIN_ORPH_LEBS + \
+ UBIFS_MIN_MAIN_LEBS)
+
+/* Node sizes (N.B. these are guaranteed to be multiples of 8) */
+#define UBIFS_CH_SZ sizeof(struct ubifs_ch)
+#define UBIFS_INO_NODE_SZ sizeof(struct ubifs_ino_node)
+#define UBIFS_DATA_NODE_SZ sizeof(struct ubifs_data_node)
+#define UBIFS_DENT_NODE_SZ sizeof(struct ubifs_dent_node)
+#define UBIFS_TRUN_NODE_SZ sizeof(struct ubifs_trun_node)
+#define UBIFS_PAD_NODE_SZ sizeof(struct ubifs_pad_node)
+#define UBIFS_SB_NODE_SZ sizeof(struct ubifs_sb_node)
+#define UBIFS_MST_NODE_SZ sizeof(struct ubifs_mst_node)
+#define UBIFS_REF_NODE_SZ sizeof(struct ubifs_ref_node)
+#define UBIFS_IDX_NODE_SZ sizeof(struct ubifs_idx_node)
+#define UBIFS_CS_NODE_SZ sizeof(struct ubifs_cs_node)
+#define UBIFS_ORPH_NODE_SZ sizeof(struct ubifs_orph_node)
+/* Extended attribute entry nodes are identical to directory entry nodes */
+#define UBIFS_XENT_NODE_SZ UBIFS_DENT_NODE_SZ
+/* Only this does not have to be multiple of 8 bytes */
+#define UBIFS_BRANCH_SZ sizeof(struct ubifs_branch)
+
+/* Maximum node sizes (N.B. these are guaranteed to be multiples of 8) */
+#define UBIFS_MAX_DATA_NODE_SZ (UBIFS_DATA_NODE_SZ + UBIFS_BLOCK_SIZE)
+#define UBIFS_MAX_INO_NODE_SZ (UBIFS_INO_NODE_SZ + UBIFS_MAX_INO_DATA)
+#define UBIFS_MAX_DENT_NODE_SZ (UBIFS_DENT_NODE_SZ + UBIFS_MAX_NLEN + 1)
+#define UBIFS_MAX_XENT_NODE_SZ UBIFS_MAX_DENT_NODE_SZ
+
+/* The largest UBIFS node */
+#define UBIFS_MAX_NODE_SZ UBIFS_MAX_INO_NODE_SZ
+
+/*
+ * On-flash inode flags.
+ *
+ * UBIFS_COMPR_FL: use compression for this inode
+ * UBIFS_SYNC_FL: I/O on this inode has to be synchronous
+ * UBIFS_IMMUTABLE_FL: inode is immutable
+ * UBIFS_APPEND_FL: writes to the inode may only append data
+ * UBIFS_DIRSYNC_FL: I/O on this directory inode has to be synchronous
+ *
+ * Note, these are on-flash flags which correspond to ioctl flags
+ * (@FS_COMPR_FL, etc). They have the same values now, but generally, do not
+ * have to be the same.
+ */
+enum {
+ UBIFS_COMPR_FL = 0x01,
+ UBIFS_SYNC_FL = 0x02,
+ UBIFS_IMMUTABLE_FL = 0x04,
+ UBIFS_APPEND_FL = 0x08,
+ UBIFS_DIRSYNC_FL = 0x10,
+};
+
+/* Inode flag bits used by UBIFS */
+#define UBIFS_FL_MASK 0x0000001F
+
+/*
+ * UBIFS compression types.
+ *
+ * UBIFS_COMPR_NONE: no compression
+ * UBIFS_COMPR_LZO: LZO compression
+ * UBIFS_COMPR_ZLIB: ZLIB compression
+ * UBIFS_COMPR_TYPES_CNT: count of supported compression types
+ */
+enum {
+ UBIFS_COMPR_NONE,
+ UBIFS_COMPR_LZO,
+ UBIFS_COMPR_ZLIB,
+ UBIFS_COMPR_TYPES_CNT,
+};
+
+/*
+ * UBIFS node types.
+ *
+ * UBIFS_INO_NODE: inode node
+ * UBIFS_DATA_NODE: data node
+ * UBIFS_DENT_NODE: directory entry node
+ * UBIFS_XENT_NODE: extended attribute node
+ * UBIFS_TRUN_NODE: truncation node
+ * UBIFS_PAD_NODE: padding node
+ * UBIFS_SB_NODE: superblock node
+ * UBIFS_MST_NODE: master node
+ * UBIFS_REF_NODE: LEB reference node
+ * UBIFS_IDX_NODE: index node
+ * UBIFS_CS_NODE: commit start node
+ * UBIFS_ORPH_NODE: orphan node
+ * UBIFS_NODE_TYPES_CNT: count of supported node types
+ *
+ * Note, we index arrays by these numbers, so keep them low and contiguous.
+ * Node type constants for inodes, direntries and so on have to be the same as
+ * corresponding key type constants.
+ */
+enum {
+ UBIFS_INO_NODE,
+ UBIFS_DATA_NODE,
+ UBIFS_DENT_NODE,
+ UBIFS_XENT_NODE,
+ UBIFS_TRUN_NODE,
+ UBIFS_PAD_NODE,
+ UBIFS_SB_NODE,
+ UBIFS_MST_NODE,
+ UBIFS_REF_NODE,
+ UBIFS_IDX_NODE,
+ UBIFS_CS_NODE,
+ UBIFS_ORPH_NODE,
+ UBIFS_NODE_TYPES_CNT,
+};
+
+/*
+ * Master node flags.
+ *
+ * UBIFS_MST_DIRTY: rebooted uncleanly - master node is dirty
+ * UBIFS_MST_NO_ORPHS: no orphan inodes present
+ * UBIFS_MST_RCVRY: written by recovery
+ */
+enum {
+ UBIFS_MST_DIRTY = 1,
+ UBIFS_MST_NO_ORPHS = 2,
+ UBIFS_MST_RCVRY = 4,
+};
+
+/*
+ * Node group type (used by recovery to recover whole group or none).
+ *
+ * UBIFS_NO_NODE_GROUP: this node is not part of a group
+ * UBIFS_IN_NODE_GROUP: this node is a part of a group
+ * UBIFS_LAST_OF_NODE_GROUP: this node is the last in a group
+ */
+enum {
+ UBIFS_NO_NODE_GROUP = 0,
+ UBIFS_IN_NODE_GROUP,
+ UBIFS_LAST_OF_NODE_GROUP,
+};
+
+/*
+ * Superblock flags.
+ *
+ * UBIFS_FLG_BIGLPT: if "big" LPT model is used if set
+ */
+enum {
+ UBIFS_FLG_BIGLPT = 0x02,
+};
+
+/**
+ * struct ubifs_ch - common header node.
+ * @magic: UBIFS node magic number (%UBIFS_NODE_MAGIC)
+ * @crc: CRC-32 checksum of the node header
+ * @sqnum: sequence number
+ * @len: full node length
+ * @node_type: node type
+ * @group_type: node group type
+ * @padding: reserved for future, zeroes
+ *
+ * Every UBIFS node starts with this common part. If the node has a key, the
+ * key always goes next.
+ */
+struct ubifs_ch {
+ __le32 magic;
+ __le32 crc;
+ __le64 sqnum;
+ __le32 len;
+ __u8 node_type;
+ __u8 group_type;
+ __u8 padding[2];
+} __attribute__ ((packed));
+
+/**
+ * union ubifs_dev_desc - device node descriptor
+ * @new: new type device descriptor
+ * @huge: huge type device descriptor
+ *
+ * This data structure describes major/minor numbers of a device node. In an
+ * inode is a device node then its data contains an object of this type. UBIFS
+ * uses standard Linux "new" and "huge" device node encodings.
+ */
+union ubifs_dev_desc {
+ __le32 new;
+ __le64 huge;
+} __attribute__ ((packed));
+
+/**
+ * struct ubifs_ino_node - inode node.
+ * @ch: common header
+ * @key: node key
+ * @creat_sqnum: sequence number at time of creation
+ * @size: inode size in bytes (amount of uncompressed data)
+ * @padding1: reserved for future, zeroed
+ * @nlink: number of hard links
+ * @atime: access time
+ * @ctime: creation time
+ * @mtime: modification time
+ * @uid: owner ID
+ * @gid: group ID
+ * @mode: access flags
+ * @flags: per-inode flags (%UBIFS_COMPR_FL, %UBIFS_SYNC_FL, etc)
+ * @data_len: inode data length
+ * @xattr_cnt: count of extended attributes this inode has
+ * @xattr_size: summarized size of all extended attributes in bytes
+ * @xattr_msize: summarized on-the-media size of all extended attributes in
+ * bytes (size of all extended attribute entries and extended
+ * attribute inodes belonging to this inode)
+ * @xattr_names: sum of lengths of all extended attribute names belonging to
+ * this inode
+ * @compr_type: compression type used for this inode
+ * @padding2: reserved for future, zeroes
+ * @data: data attached to the inode
+ *
+ * Note, even though inode compression type is defined by @compr_type, some
+ * nodes of this inode may be compressed with different compressor - this
+ * happens if compression type is changed while the inode already has data
+ * nodes. But @compr_type will be use for further writes to the inode.
+ */
+struct ubifs_ino_node {
+ struct ubifs_ch ch;
+ __u8 key[UBIFS_MAX_KEY_LEN];
+ __le64 creat_sqnum;
+ __le64 size;
+ __u8 padding1[8];
+ __le32 nlink;
+ __le32 atime;
+ __le32 ctime;
+ __le32 mtime;
+ __le32 uid;
+ __le32 gid;
+ __le32 mode;
+ __le32 flags;
+ __le32 data_len;
+ __le32 xattr_cnt;
+ __le64 xattr_size;
+ __le64 xattr_msize;
+ __le32 xattr_names;
+ __le16 compr_type;
+ __u8 padding2[34];
+ __u8 data[];
+} __attribute__ ((packed));
+
+/**
+ * struct ubifs_dent_node - directory entry node.
+ * @ch: common header
+ * @key: node key
+ * @inum: target inode number
+ * @padding: reserved for future, zeroes
+ * @type: type of the target inode (%UBIFS_ITYPE_REG, %UBIFS_ITYPE_DIR, etc)
+ * @nlen: name length
+ * @padding1: reserved for future, zeroes
+ * @name: zero-terminated name
+ */
+struct ubifs_dent_node {
+ struct ubifs_ch ch;
+ __u8 key[UBIFS_MAX_KEY_LEN];
+ __le64 inum;
+ __u8 padding;
+ __u8 type;
+ __le16 nlen;
+ __u8 padding1[4];
+ __u8 name[];
+} __attribute__ ((packed));
+
+/**
+ * struct ubifs_data_node - data node.
+ * @ch: common header
+ * @key: node key
+ * @size: uncompressed data size in bytes
+ * @compr_type: compression type (%UBIFS_COMPR_NONE, %UBIFS_COMPR_LZO, etc)
+ * @padding: reserved for future, zeroes
+ * @data: data
+ */
+struct ubifs_data_node {
+ struct ubifs_ch ch;
+ __u8 key[UBIFS_MAX_KEY_LEN];
+ __le32 size;
+ __le16 compr_type;
+ __u8 padding[2];
+ __u8 data[];
+} __attribute__ ((packed));
+
+/**
+ * struct ubifs_trun_node - truncation node.
+ * @ch: common header
+ * @key: truncation node key
+ * @old_size: size before truncation
+ * @new_size: size after truncation
+ *
+ * This node exists only in the journal and never goes to the main area.
+ */
+struct ubifs_trun_node {
+ struct ubifs_ch ch;
+ __u8 key[UBIFS_MAX_KEY_LEN];
+ __le64 old_size;
+ __le64 new_size;
+} __attribute__ ((packed));
+
+/**
+ * struct ubifs_pad_node - padding node.
+ * @ch: common header
+ * @pad_len: how many bytes after this node are unused (because padded)
+ * @padding: reserved for future, zeroes
+ */
+struct ubifs_pad_node {
+ struct ubifs_ch ch;
+ __le32 pad_len;
+} __attribute__ ((packed));
+
+/**
+ * struct ubifs_sb_node - superblock node.
+ * @ch: common header
+ * @padding: reserved for future, zeroes
+ * @key_hash: type of hash function used in keys
+ * @key_fmt: format of the key
+ * @flags: file-system flags (%UBIFS_FLG_BIGLPT, etc)
+ * @min_io_size: minimal input/output unit size
+ * @leb_size: logical eraseblock size in bytes
+ * @leb_cnt: count of LEBs used by filesystem
+ * @max_leb_cnt: maximum count of LEBs used by filesystem
+ * @max_bud_bytes: maximum amount of data stored in buds
+ * @log_lebs: log size in logical eraseblocks
+ * @lpt_lebs: number of LEBs used for lprops table
+ * @orph_lebs: number of LEBs used for recording orphans
+ * @jhead_cnt: count of journal heads
+ * @fanout: tree fanout (max. number of links per indexing node)
+ * @lsave_cnt: number of LEB numbers in LPT's save table
+ * @fmt_vers: UBIFS on-flash format version
+ * @default_compr: default compression
+ * @padding1: reserved for future, zeroes
+ * @rp_uid: reserve pool UID
+ * @rp_gid: reserve pool GID
+ * @rp_size: size of the reserved pool in bytes
+ * @padding2: reserved for future, zeroes
+ */
+struct ubifs_sb_node {
+ struct ubifs_ch ch;
+ __u8 padding[2];
+ __u8 key_hash;
+ __u8 key_fmt;
+ __le32 flags;
+ __le32 min_io_size;
+ __le32 leb_size;
+ __le32 leb_cnt;
+ __le32 max_leb_cnt;
+ __le64 max_bud_bytes;
+ __le32 log_lebs;
+ __le32 lpt_lebs;
+ __le32 orph_lebs;
+ __le32 jhead_cnt;
+ __le32 fanout;
+ __le32 lsave_cnt;
+ __le32 fmt_vers;
+ __le16 default_compr;
+ __u8 padding1[2];
+ __le32 rp_uid;
+ __le32 rp_gid;
+ __le64 rp_size;
+ __u8 padding2[3992];
+} __attribute__ ((packed));
+
+/**
+ * struct ubifs_mst_node - master node.
+ * @ch: common header
+ * @highest_inum: highest inode number in the committed index
+ * @cmt_no: commit number
+ * @flags: various flags (%UBIFS_MST_DIRTY, etc)
+ * @log_lnum: start of the log
+ * @root_lnum: LEB number of the root indexing node
+ * @root_offs: offset within @root_lnum
+ * @root_len: root indexing node length
+ * @gc_lnum: LEB reserved for garbage collection (%-1 value means the LEB was
+ * not reserved and should be reserved on mount)
+ * @ihead_lnum: LEB number of index head
+ * @ihead_offs: offset of index head
+ * @index_size: size of index on flash
+ * @total_free: total free space in bytes
+ * @total_dirty: total dirty space in bytes
+ * @total_used: total used space in bytes (includes only data LEBs)
+ * @total_dead: total dead space in bytes (includes only data LEBs)
+ * @total_dark: total dark space in bytes (includes only data LEBs)
+ * @lpt_lnum: LEB number of LPT root nnode
+ * @lpt_offs: offset of LPT root nnode
+ * @nhead_lnum: LEB number of LPT head
+ * @nhead_offs: offset of LPT head
+ * @ltab_lnum: LEB number of LPT's own lprops table
+ * @ltab_offs: offset of LPT's own lprops table
+ * @lsave_lnum: LEB number of LPT's save table (big model only)
+ * @lsave_offs: offset of LPT's save table (big model only)
+ * @lscan_lnum: LEB number of last LPT scan
+ * @empty_lebs: number of empty logical eraseblocks
+ * @idx_lebs: number of indexing logical eraseblocks
+ * @leb_cnt: count of LEBs used by filesystem
+ * @padding: reserved for future, zeroes
+ */
+struct ubifs_mst_node {
+ struct ubifs_ch ch;
+ __le64 highest_inum;
+ __le64 cmt_no;
+ __le32 flags;
+ __le32 log_lnum;
+ __le32 root_lnum;
+ __le32 root_offs;
+ __le32 root_len;
+ __le32 gc_lnum;
+ __le32 ihead_lnum;
+ __le32 ihead_offs;
+ __le64 index_size;
+ __le64 total_free;
+ __le64 total_dirty;
+ __le64 total_used;
+ __le64 total_dead;
+ __le64 total_dark;
+ __le32 lpt_lnum;
+ __le32 lpt_offs;
+ __le32 nhead_lnum;
+ __le32 nhead_offs;
+ __le32 ltab_lnum;
+ __le32 ltab_offs;
+ __le32 lsave_lnum;
+ __le32 lsave_offs;
+ __le32 lscan_lnum;
+ __le32 empty_lebs;
+ __le32 idx_lebs;
+ __le32 leb_cnt;
+ __u8 padding[344];
+} __attribute__ ((packed));
+
+/**
+ * struct ubifs_ref_node - logical eraseblock reference node.
+ * @ch: common header
+ * @lnum: the referred logical eraseblock number
+ * @offs: start offset in the referred LEB
+ * @jhead: journal head number
+ * @padding: reserved for future, zeroes
+ */
+struct ubifs_ref_node {
+ struct ubifs_ch ch;
+ __le32 lnum;
+ __le32 offs;
+ __le32 jhead;
+ __u8 padding[28];
+} __attribute__ ((packed));
+
+/**
+ * struct ubifs_branch - key/reference/length branch
+ * @lnum: LEB number of the target node
+ * @offs: offset within @lnum
+ * @len: target node length
+ * @key: key
+ */
+struct ubifs_branch {
+ __le32 lnum;
+ __le32 offs;
+ __le32 len;
+ __u8 key[];
+} __attribute__ ((packed));
+
+/**
+ * struct ubifs_idx_node - indexing node.
+ * @ch: common header
+ * @child_cnt: number of child index nodes
+ * @level: tree level
+ * @padding: reserved for future, zeroes
+ * @branches: LEB number / offset / length / key branches
+ */
+struct ubifs_idx_node {
+ struct ubifs_ch ch;
+ __le16 child_cnt;
+ __le16 level;
+ __u8 padding[4];
+ __u8 branches[];
+} __attribute__ ((packed));
+
+/**
+ * struct ubifs_cs_node - commit start node.
+ * @ch: common header
+ * @cmt_no: commit number
+ */
+struct ubifs_cs_node {
+ struct ubifs_ch ch;
+ __le64 cmt_no;
+} __attribute__ ((packed));
+
+/**
+ * struct ubifs_orph_node - orphan node.
+ * @ch: common header
+ * @cmt_no: commit number (also top bit is set on the last node of the commit)
+ * @inos: inode numbers of orphans
+ */
+struct ubifs_orph_node {
+ struct ubifs_ch ch;
+ __le64 cmt_no;
+ __le64 inos[];
+} __attribute__ ((packed));
+
+#endif /* __UBIFS_MEDIA_H__ */
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
new file mode 100644
index 0000000..54a694e
--- /dev/null
+++ b/fs/ubifs/ubifs.h
@@ -0,0 +1,1519 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/* Implementation version 0.2 */
+
+#ifndef __UBIFS_H__
+#define __UBIFS_H__
+
+#include <linux/statfs.h>
+#include <linux/fs.h>
+#include <linux/err.h>
+#include <linux/sched.h>
+#include <linux/vmalloc.h>
+#include <linux/spinlock.h>
+#include <linux/mutex.h>
+#include <linux/rwsem.h>
+#include <linux/mtd/ubi.h>
+#include <linux/pagemap.h>
+#include <linux/backing-dev.h>
+#include "ubifs-media.h"
+
+/* Version of this UBIFS implementation */
+#define UBIFS_VERSION 1
+
+/* Normal UBIFS messages */
+#define ubifs_msg(fmt, ...) \
+ printk(KERN_NOTICE "UBIFS: " fmt "\n", ##__VA_ARGS__)
+/* UBIFS error messages */
+#define ubifs_err(fmt, ...) \
+ printk(KERN_ERR "UBIFS error (pid %d): %s: " fmt "\n", current->pid, \
+ __func__, ##__VA_ARGS__)
+/* UBIFS warning messages */
+#define ubifs_warn(fmt, ...) \
+ printk(KERN_WARNING "UBIFS warning (pid %d): %s: " fmt "\n", \
+ current->pid, __func__, ##__VA_ARGS__)
+
+/* UBIFS file system VFS magic number */
+#define UBIFS_SUPER_MAGIC 0x24051905
+
+/* "File system end of life" sequence number watermark */
+#define SQNUM_WARN_WATERMARK 0xFFFFFFFF00000000ULL
+#define SQNUM_WATERMARK 0xFFFFFFFFFF000000ULL
+
+/* Minimum amount of data UBIFS writes to the flash */
+#define MIN_WRITE_SZ (UBIFS_DATA_NODE_SZ + 8)
+
+/*
+ * Currently we do not support inode number overlapping and re-using, so this
+ * watermark defines dangerous inode number level. This should be fixed later,
+ * although it is difficult to exceed current limit. Another option is to use
+ * 64-bit inode numbers, but this means more overhead.
+ */
+#define INUM_WARN_WATERMARK 0xFFF00000
+#define INUM_WATERMARK 0xFFFFFF00
+
+/* Largest key size supported in this implementation */
+#define CUR_MAX_KEY_LEN UBIFS_SK_LEN
+
+/* znode flags */
+#define DIRTY_ZNODE 0
+#define COW_ZNODE 1
+#define OBSOLETE_ZNODE 2
+
+/* LPT cnode flags */
+#define DIRTY_CNODE 0
+#define COW_CNODE 1
+#define OBSOLETE_CNODE 2
+
+/* Dirty flags (lpt_drty_flgs) for LPT special nodes */
+#define LTAB_DIRTY 1
+#define LSAVE_DIRTY 2
+
+/* Maximum number of entries in each LPT (LEB category) heap */
+#define LPT_HEAP_SZ 256
+
+/* Background thread name */
+#define SYNCER_BG_NAME "ubifs_bg_thread"
+
+/* Default write-buffer synchronization timeout (5 secs) */
+#define DEFAULT_WBUF_TIMEOUT (5 * HZ)
+
+/* Maximum possible inode number (only 32-bit inodes are supported now) */
+#define MAX_INUM 0xFFFFFFFF
+
+/* Number of non-data journal heads */
+#define NONDATA_JHEADS_CNT 2
+
+/* Garbage collector head */
+#define GCHD 0
+/* Base journal head number */
+#define BASEHD 1
+/* First "general purpose" journal head */
+#define DATAHD 2
+
+/*
+ * How much a directory entry/extended attribute entry adds to the parent/host
+ * inode.
+ */
+#define CALC_DENT_SIZE(name_len) ALIGN(UBIFS_DENT_NODE_SZ + (name_len) + 1, 8)
+
+/*
+ * Znodes which were not touched for 'OLD_ZNODE_AGE' seconds are considered
+ * "old", and znode which were touched last 'YOUNG_ZNODE_AGE' seconds ago are
+ * considered "young". This is used by shrinker when selecting znode to trim
+ * off.
+ */
+#define OLD_ZNODE_AGE 20
+#define YOUNG_ZNODE_AGE 5
+
+/*
+ * Some compressors, like LZO, may end up with more data then the input buffer.
+ * So UBIFS always allocates larger output buffer, to be sure the compressor
+ * will not corrupt memory in case of worst case compression.
+ */
+#define WORST_COMPR_FACTOR 2
+
+/*
+ * Commit states.
+ *
+ * COMMIT_RESTING: commit is not wanted
+ * COMMIT_BACKGROUND: background commit has been requested
+ * COMMIT_REQUIRED: commit is required
+ * COMMIT_RUNNING_BACKGROUND: background commit is running
+ * COMMIT_RUNNING_REQUIRED: commit is running and it is required
+ * COMMIT_BROKEN: commit failed
+ */
+enum {
+ COMMIT_RESTING = 0,
+ COMMIT_BACKGROUND,
+ COMMIT_REQUIRED,
+ COMMIT_RUNNING_BACKGROUND,
+ COMMIT_RUNNING_REQUIRED,
+ COMMIT_BROKEN,
+};
+
+/*
+ * 'ubifs_scan_a_node()' return values.
+ *
+ * SCANNED_GARBAGE: scanned garbage
+ * SCANNED_EMPTY_SPACE: scanned empty space
+ * SCANNED_A_NODE: scanned a valid node
+ * SCANNED_A_CORRUPT_NODE: scanned a corrupted node
+ * SCANNED_A_BAD_PAD_NODE: scanned a padding node with invalid pad length
+ *
+ * Greater than zero means: 'scanned that number of padding bytes'
+ */
+enum {
+ SCANNED_GARBAGE = 0,
+ SCANNED_EMPTY_SPACE = -1,
+ SCANNED_A_NODE = -2,
+ SCANNED_A_CORRUPT_NODE = -3,
+ SCANNED_A_BAD_PAD_NODE = -4,
+};
+
+/**
+ * struct ubifs_old_idx - index node obsoleted since last commit start.
+ * @rb: rb-tree node
+ * @lnum: LEB number of obsoleted index node
+ * @offs: offset of obsoleted index node
+ */
+struct ubifs_old_idx {
+ struct rb_node rb;
+ int lnum;
+ int offs;
+};
+
+/* The below union makes it easier to deal with keys */
+union ubifs_key {
+ uint8_t u8[CUR_MAX_KEY_LEN];
+ uint32_t u32[CUR_MAX_KEY_LEN/4];
+ uint64_t u64[CUR_MAX_KEY_LEN/8];
+ __le32 j32[CUR_MAX_KEY_LEN/4];
+};
+
+/**
+ * struct ubifs_scan_node - UBIFS scanned node information.
+ * @list: list of scanned nodes
+ * @key: key of node scanned (if it has one)
+ * @sqnum: sequence number
+ * @type: type of node scanned
+ * @offs: offset with LEB of node scanned
+ * @len: length of node scanned
+ * @node: raw node
+ */
+struct ubifs_scan_node {
+ struct list_head list;
+ union ubifs_key key;
+ unsigned long long sqnum;
+ int type;
+ int offs;
+ int len;
+ void *node;
+};
+
+/**
+ * struct ubifs_scan_leb - UBIFS scanned LEB information.
+ * @lnum: logical eraseblock number
+ * @nodes_cnt: number of nodes scanned
+ * @nodes: list of struct ubifs_scan_node
+ * @endpt: end point (and therefore the start of empty space)
+ * @ecc: read returned -EBADMSG
+ * @buf: buffer containing entire LEB scanned
+ */
+struct ubifs_scan_leb {
+ int lnum;
+ int nodes_cnt;
+ struct list_head nodes;
+ int endpt;
+ int ecc;
+ void *buf;
+};
+
+/**
+ * struct ubifs_gced_idx_leb - garbage-collected indexing LEB.
+ * @list: list
+ * @lnum: LEB number
+ * @unmap: OK to unmap this LEB
+ *
+ * This data structure is used to temporary store garbage-collected indexing
+ * LEBs - they are not released immediately, but only after the next commit.
+ * This is needed to guarantee recoverability.
+ */
+struct ubifs_gced_idx_leb {
+ struct list_head list;
+ int lnum;
+ int unmap;
+};
+
+/**
+ * struct ubifs_inode - UBIFS in-memory inode description.
+ * @vfs_inode: VFS inode description object
+ * @creat_sqnum: sequence number at time of creation
+ * @xattr_size: summarized size of all extended attributes in bytes, protected
+ * by @inode->i_lock
+ * @xattr_msize: summarized on-the-media size of all extended attributes in
+ * bytes (size of all extended attribute entries and extended
+ * attribute inodes belonging to this inode)
+ * @xattr_cnt: count of extended attributes this inode has
+ * @xattr_names: sum of lengths of all extended attribute names belonging to
+ * this inode
+ * @dirty: non-zero if the inode is dirty
+ * @xattr: non-zero if this is an extended attribute inode
+ * @budgeted: non-zero if the inode has been budgeted (used for debugging)
+ * @budg_mutex: serializes inode budgeting and write-back
+ * @flags: inode flags (@UBIFS_COMPR_FL, etc)
+ * @compr_type: default compression type used for this inode
+ * @data_len: length of the data attached to the inode
+ * @data: inode's data
+ *
+ * UBIFS has its own inode mutex, besides the VFS 'i_mutex'. The reason for
+ * this is budgeting - UBIFS has to budget each operation. So, if an operation
+ * is going to mark an inode dirty, it has to allocate budget for this. It
+ * cannot just mark it dirty because there is no guarantee there will be enough
+ * flash space when it is time to write the inode back. This means that UBIFS
+ * has to have full control over "clean <-> dirty" transitions of inodes (and
+ * pages actually, but it is easy for pages, because we have
+ * 'ubifs_prepare_write()' which is called _before_ every page change). But
+ * unfortunately, VFS marks inodes dirty in many places, and it does not ask
+ * the file-system if it is allowed to do so (there is a notifier, but it is
+ * not enough), i.e., there is no mechanism to synchronize with this. So we
+ * introduce our own dirty flag to UBIFS inodes and our own inode mutex to
+ * serialize "clean <-> dirty" transitions.
+ */
+struct ubifs_inode {
+ struct inode vfs_inode;
+ unsigned long long creat_sqnum;
+ long long xattr_size;
+ long long xattr_msize;
+ int xattr_cnt;
+ int xattr_names;
+ unsigned int dirty:1;
+ unsigned int xattr:1;
+#ifdef CONFIG_UBIFS_FS_DEBUG
+ unsigned int budgeted:1;
+#endif
+ struct mutex budg_mutex;
+ int flags;
+ int compr_type;
+ int data_len;
+ void *data;
+};
+
+/**
+ * struct ubifs_unclean_leb - records a LEB recovered under read-only mode.
+ * @list: list
+ * @lnum: LEB number of recovered LEB
+ * @endpt: offset where recovery ended
+ *
+ * This structure records a LEB identified during recovery that needs to be
+ * cleaned but was not because UBIFS was mounted read-only. The information
+ * is used to clean the LEB when remounting to read-write mode.
+ */
+struct ubifs_unclean_leb {
+ struct list_head list;
+ int lnum;
+ int endpt;
+};
+
+/*
+ * LEB properties flags.
+ *
+ * LPROPS_UNCAT: not categorized
+ * LPROPS_DIRTY: dirty > 0, not index
+ * LPROPS_DIRTY_IDX: dirty + free > UBIFS_CH_SZ and index
+ * LPROPS_FREE: free > 0, not empty, not index
+ * LPROPS_HEAP_CNT: number of heaps used for storing categorized LEBs
+ * LPROPS_EMPTY: LEB is empty, not taken
+ * LPROPS_FREEABLE: free + dirty == leb_size, not index, not taken
+ * LPROPS_FRDI_IDX: free + dirty == leb_size and index, may be taken
+ * LPROPS_CAT_MASK: mask for the LEB categories above
+ * LPROPS_TAKEN: LEB was taken (this flag is not saved on the media)
+ * LPROPS_INDEX: LEB contains indexing nodes (this flag also exists on flash)
+ */
+enum {
+ LPROPS_UNCAT = 0,
+ LPROPS_DIRTY = 1,
+ LPROPS_DIRTY_IDX = 2,
+ LPROPS_FREE = 3,
+ LPROPS_HEAP_CNT = 3,
+ LPROPS_EMPTY = 4,
+ LPROPS_FREEABLE = 5,
+ LPROPS_FRDI_IDX = 6,
+ LPROPS_CAT_MASK = 15,
+ LPROPS_TAKEN = 16,
+ LPROPS_INDEX = 32,
+};
+
+/**
+ * struct ubifs_lprops - logical eraseblock properties.
+ * @free: amount of free space in bytes
+ * @dirty: amount of dirty space in bytes
+ * @flags: LEB properties flags (see above)
+ * @lnum: LEB number
+ * @list: list of same-category lprops (for LPROPS_EMPTY and LPROPS_FREEABLE)
+ * @hpos: heap position in heap of same-category lprops (other categories)
+ */
+struct ubifs_lprops {
+ int free;
+ int dirty;
+ int flags;
+ int lnum;
+ union {
+ struct list_head list;
+ int hpos;
+ };
+};
+
+/**
+ * struct ubifs_lpt_lprops - LPT logical eraseblock properties.
+ * @free: amount of free space in bytes
+ * @dirty: amount of dirty space in bytes
+ * @tgc: trivial GC flag (1 => unmap after commit end)
+ * @cmt: commit flag (1 => reserved for commit)
+ */
+struct ubifs_lpt_lprops {
+ int free;
+ int dirty;
+ unsigned tgc : 1;
+ unsigned cmt : 1;
+};
+
+/**
+ * struct ubifs_lp_stats - statistics of eraseblocks in the main area.
+ * @empty_lebs: number of empty LEBs
+ * @taken_empty_lebs: number of taken LEBs
+ * @idx_lebs: number of indexing LEBs
+ * @total_free: total free space in bytes
+ * @total_dirty: total dirty space in bytes
+ * @total_used: total used space in bytes (includes only data LEBs)
+ * @total_dead: total dead space in bytes (includes only data LEBs)
+ * @total_dark: total dark space in bytes (includes only data LEBs)
+ *
+ * N.B. total_dirty and total_used are different to other total_* fields,
+ * because they account _all_ LEBs, not just data LEBs.
+ *
+ * 'taken_empty_lebs' counts the LEBs that are in the transient state of having
+ * been 'taken' for use but not yet written to. 'taken_empty_lebs' is needed
+ * to account correctly for gc_lnum, otherwise 'empty_lebs' could be used
+ * by itself (in which case 'unused_lebs' would be a better name). In the case
+ * of gc_lnum, it is 'taken' at mount time or whenever a LEB is retained by GC,
+ * but unlike other empty LEBs that are 'taken', it may not be written straight
+ * away (i.e. before the next commit start or unmount), so either gc_lnum must
+ * be specially accounted for, or the current approach followed i.e. count it
+ * under 'taken_empty_lebs'.
+ */
+struct ubifs_lp_stats {
+ int empty_lebs;
+ int taken_empty_lebs;
+ int idx_lebs;
+ long long total_free;
+ long long total_dirty;
+ long long total_used;
+ long long total_dead;
+ long long total_dark;
+};
+
+struct ubifs_nnode;
+
+/**
+ * struct ubifs_cnode - LEB Properties Tree common node.
+ * @parent: parent nnode
+ * @cnext: next cnode to commit
+ * @flags: flags (%DIRTY_LPT_NODE or %OBSOLETE_LPT_NODE)
+ * @iip: index in parent
+ * @level: level in the tree (zero for pnodes, greater than zero for nnodes)
+ * @num: node number
+ */
+struct ubifs_cnode {
+ struct ubifs_nnode *parent;
+ struct ubifs_cnode *cnext;
+ unsigned long flags;
+ int iip;
+ int level;
+ int num;
+};
+
+/**
+ * struct ubifs_pnode - LEB Properties Tree leaf node.
+ * @parent: parent nnode
+ * @cnext: next cnode to commit
+ * @flags: flags (%DIRTY_LPT_NODE or %OBSOLETE_LPT_NODE)
+ * @iip: index in parent
+ * @level: level in the tree (always zero for pnodes)
+ * @num: node number
+ * @lprops: LEB properties array
+ */
+struct ubifs_pnode {
+ struct ubifs_nnode *parent;
+ struct ubifs_cnode *cnext;
+ unsigned long flags;
+ int iip;
+ int level;
+ int num;
+ struct ubifs_lprops lprops[UBIFS_LPT_FANOUT];
+};
+
+/**
+ * struct ubifs_nbranch - LEB Properties Tree internal node branch.
+ * @lnum: LEB number of child
+ * @offs: offset of child
+ * @nnode: nnode child
+ * @pnode: pnode child
+ * @cnode: cnode child
+ */
+struct ubifs_nbranch {
+ int lnum;
+ int offs;
+ union {
+ struct ubifs_nnode *nnode;
+ struct ubifs_pnode *pnode;
+ struct ubifs_cnode *cnode;
+ };
+};
+
+/**
+ * struct ubifs_nnode - LEB Properties Tree internal node.
+ * @parent: parent nnode
+ * @cnext: next cnode to commit
+ * @flags: flags (%DIRTY_LPT_NODE or %OBSOLETE_LPT_NODE)
+ * @iip: index in parent
+ * @level: level in the tree (always greater than zero for nnodes)
+ * @num: node number
+ * @nbranch: branches to child nodes
+ */
+struct ubifs_nnode {
+ struct ubifs_nnode *parent;
+ struct ubifs_cnode *cnext;
+ unsigned long flags;
+ int iip;
+ int level;
+ int num;
+ struct ubifs_nbranch nbranch[UBIFS_LPT_FANOUT];
+};
+
+/**
+ * struct ubifs_lpt_heap - heap of categorized lprops.
+ * @arr: heap array
+ * @cnt: number in heap
+ * @max_cnt: maximum number allowed in heap
+ *
+ * There are %LPROPS_HEAP_CNT heaps.
+ */
+struct ubifs_lpt_heap {
+ struct ubifs_lprops **arr;
+ int cnt;
+ int max_cnt;
+};
+
+/*
+ * Return codes for LPT scan callback function.
+ *
+ * LPT_SCAN_CONTINUE: continue scanning
+ * LPT_SCAN_ADD: add the LEB properties scanned to the tree in memory
+ * LPT_SCAN_STOP: stop scanning
+ */
+enum {
+ LPT_SCAN_CONTINUE = 0,
+ LPT_SCAN_ADD = 1,
+ LPT_SCAN_STOP = 2,
+};
+
+struct ubifs_info;
+
+/* Callback used by the 'ubifs_lpt_scan_nolock()' function */
+typedef int (*ubifs_lpt_scan_callback)(struct ubifs_info *c,
+ const struct ubifs_lprops *lprops,
+ int in_tree, void *data);
+
+/**
+ * struct ubifs_wbuf - UBIFS write-buffer.
+ * @c: UBIFS file-system description object
+ * @buf: write-buffer (of min. flash I/O unit size)
+ * @lnum: logical eraseblock number the write-buffer points to
+ * @offs: write-buffer offset in this logical eraseblock
+ * @avail: number of bytes available in the write-buffer
+ * @used: number of used bytes in the write-buffer
+ * @dtype: type of data stored in this LEB (%UBI_LONGTERM, %UBI_SHORTTERM,
+ * %UBI_UNKNOWN)
+ * @jhead: journal head the mutex belongs to (note, needed only to shut lockdep
+ * up by 'mutex_lock_nested()).
+ * @sync_callback: write-buffer synchronization callback
+ * @io_mutex: serializes write-buffer I/O
+ * @lock: serializes @buf, @lnum, @offs, @avail, @used, @next_ino and @inodes
+ * fields
+ * @timer: write-buffer timer
+ * @timeout: timer expire interval in jiffies
+ * @need_sync: it is set if its timer expired and needs sync
+ * @next_ino: points to the next position of the following inode number
+ * @inodes: stores the inode numbers of the nodes which are in wbuf
+ *
+ * The write-buffer synchronization callback is called when the write-buffer is
+ * synchronized in order to notify how much space was wasted due to
+ * write-buffer padding and how much free space is left in the LEB.
+ *
+ * Note: the fields @buf, @lnum, @offs, @avail and @used can be read under
+ * spin-lock or mutex because they are written under both mutex and spin-lock.
+ * @buf is appended to under mutex but overwritten under both mutex and
+ * spin-lock. Thus the data between @buf and @buf + @used can be read under
+ * spinlock.
+ */
+struct ubifs_wbuf {
+ struct ubifs_info *c;
+ void *buf;
+ int lnum;
+ int offs;
+ int avail;
+ int used;
+ int dtype;
+ int jhead;
+ int (*sync_callback)(struct ubifs_info *c, int lnum, int free, int pad);
+ struct mutex io_mutex;
+ spinlock_t lock;
+ struct timer_list timer;
+ int timeout;
+ int need_sync;
+ int next_ino;
+ ino_t *inodes;
+};
+
+/**
+ * struct ubifs_bud - bud logical eraseblock.
+ * @lnum: logical eraseblock number
+ * @start: where the (uncommitted) bud data starts
+ * @jhead: journal head number this bud belongs to
+ * @list: link in the list buds belonging to the same journal head
+ * @rb: link in the tree of all buds
+ */
+struct ubifs_bud {
+ int lnum;
+ int start;
+ int jhead;
+ struct list_head list;
+ struct rb_node rb;
+};
+
+/**
+ * struct ubifs_jhead - journal head.
+ * @wbuf: head's write-buffer
+ * @buds_list: list of bud LEBs belonging to this journal head
+ *
+ * Note, the @buds list is protected by the @c->buds_lock.
+ */
+struct ubifs_jhead {
+ struct ubifs_wbuf wbuf;
+ struct list_head buds_list;
+};
+
+/**
+ * struct ubifs_zbranch - key/coordinate/length branch stored in znodes.
+ * @key: key
+ * @znode: znode address in memory
+ * @lnum: LEB number of the indexing node
+ * @offs: offset of the indexing node within @lnum
+ * @len: target node length
+ */
+struct ubifs_zbranch {
+ union ubifs_key key;
+ union {
+ struct ubifs_znode *znode;
+ void *leaf;
+ };
+ int lnum;
+ int offs;
+ int len;
+};
+
+/**
+ * struct ubifs_znode - in-memory representation of an indexing node.
+ * @parent: parent znode or NULL if it is the root
+ * @cnext: next znode to commit
+ * @flags: flags
+ * @time: last access time (seconds)
+ * @level: level of the entry in the TNC tree
+ * @child_cnt: count of child znodes
+ * @iip: index in parent's zbranch array
+ * @alt: lower bound of key range has altered i.e. child inserted at slot 0
+ * @lnum: LEB number of the corresponding indexing node
+ * @offs: offset of the corresponding indexing node
+ * @len: length of the corresponding indexing node
+ * @zbranch: array of znode branches (@c->fanout elements)
+ */
+struct ubifs_znode {
+ struct ubifs_znode *parent;
+ struct ubifs_znode *cnext;
+ unsigned long flags;
+ unsigned long time;
+ int level;
+ int child_cnt;
+ int iip;
+ int alt;
+#ifdef CONFIG_UBIFS_FS_DEBUG
+ int lnum, offs, len;
+#endif
+ struct ubifs_zbranch zbranch[];
+};
+
+/**
+ * struct ubifs_node_range - node length range description data structure.
+ * @len: fixed node length
+ * @min_len: minimum possible node length
+ * @max_len: maximum possible node length
+ *
+ * If @max_len is %0, the node has fixed length @len.
+ */
+struct ubifs_node_range {
+ union {
+ int len;
+ int min_len;
+ };
+ int max_len;
+};
+
+/**
+ * struct ubifs_compressor - UBIFS compressor description structure.
+ * @compr_type: compressor type (%UBIFS_COMPR_LZO, etc)
+ * @cc: cryptoapi compressor handle
+ * @comp_mutex: mutex used during compression
+ * @decomp_mutex: mutex used during decompression
+ * @name: compressor name
+ * @capi_name: cryptoapi compressor name
+ */
+struct ubifs_compressor {
+ int compr_type;
+ struct crypto_comp *cc;
+ struct mutex *comp_mutex;
+ struct mutex *decomp_mutex;
+ const char *name;
+ const char *capi_name;
+};
+
+/**
+ * struct ubifs_budget_req - budget requirements of an operation.
+ *
+ * @new_ino: non-zero if the operation adds a new inode
+ * @dirtied_ino: how many inodes the operation makes dirty
+ * @new_page: non-zero if the operation adds a new page
+ * @dirtied_page: non-zero if the operation makes a page dirty
+ * @new_dent: non-zero if the operation adds a new directory entry
+ * @mod_dent: non-zero if the operation removes or modifies an existing
+ * directory entry
+ * @new_ino_d: now much data newly created inode contains
+ * @dirtied_ino_d: now much data dirtied inode contains
+ * @idx_growth: how much the index will supposedly grow
+ * @data_growth: how much new data the operation will supposedly add
+ * @dd_growth: how much data that makes other data dirty the operation will
+ * supposedly add
+ *
+ * @idx_groqth, @data_growth and @dd_growth are not used in budget request. The
+ * budgeting subsystem caches index and data growth values there to avoid
+ * re-calculating them when the budget is released. However, if @idx_growth is
+ * %-1, it is calculated by the release function using other fields.
+ *
+ * An inode may contain 4KiB of data at max., thus the widths of @new_ino_d
+ * is 13 bits, and @dirtied_ino_d - 15, because up to 4 inodes may be made
+ * dirty by the re-name operation.
+ */
+struct ubifs_budget_req {
+ unsigned int new_ino:1;
+ unsigned int dirtied_ino:4;
+ unsigned int new_page:1;
+ unsigned int dirtied_page:1;
+ unsigned int new_dent:1;
+ unsigned int mod_dent:1;
+ unsigned int new_ino_d:13;
+ unsigned int dirtied_ino_d:15;
+ int idx_growth;
+ int data_growth;
+ int dd_growth;
+};
+
+/**
+ * struct ubifs_orphan - stores the inode number of an orphan.
+ * @rb: rb-tree node of rb-tree of orphans sorted by inode number
+ * @list: list head of list of orphans in order added
+ * @new_list: list head of list of orphans added since the last commit
+ * @cnext: next orphan to commit
+ * @dnext: next orphan to delete
+ * @inum: inode number
+ * @new: %1 => added since the last commit, otherwise %0
+ */
+struct ubifs_orphan {
+ struct rb_node rb;
+ struct list_head list;
+ struct list_head new_list;
+ struct ubifs_orphan *cnext;
+ struct ubifs_orphan *dnext;
+ ino_t inum;
+ int new;
+};
+
+/**
+ * struct ubifs_mount_opts - UBIFS-specific mount options information.
+ * @unmount_mode: selected unmount mode (%0 default, %1 normal, %2 fast)
+ */
+struct ubifs_mount_opts {
+ unsigned int unmount_mode:2;
+};
+
+/**
+ * struct ubifs_info - UBIFS file-system description data structure
+ * (per-superblock).
+ * @vfs_sb: VFS @struct super_block object
+ *
+ * @highest_inum: highest used inode number
+ * @vfs_gen: VFS inode generation counter
+ * @max_sqnum: current global sequence number
+ * @cmt_no: commit number (last successfully completed commit)
+ * @cnt_lock: protects @highest_inum, @vfs_gen, and @max_sqnum counters
+ * @fmt_vers: UBIFS on-flash format version
+ *
+ * @lhead_lnum: log head logical eraseblock number
+ * @lhead_offs: log head offset
+ * @ltail_lnum: log tail logical eraseblock number (offset is always 0)
+ * @log_mutex: protects the log, @lhead_lnum, @lhead_offs and @ltail_lnum
+ * @min_log_bytes: minimum required number of bytes in the log
+ * @cmt_bud_bytes: used during commit to temporarily amount of bytes in
+ * committed buds
+ *
+ * @buds: tree of all buds indexed by bud LEB number
+ * @bud_bytes: how many bytes of flash is used by buds
+ * @buds_lock: protects the @buds tree, @bud_bytes, and per-journal head bud
+ * lists
+ * @jhead_cnt: count of journal heads
+ * @jheads: journal heads (head zero is base head)
+ * @max_bud_bytes: maximum number of bytes allowed in buds
+ * @bg_bud_bytes: number of bud bytes when background commit is initiated
+ * @old_buds: buds to be released after commit ends
+ * @max_bud_cnt: maximum number of buds
+ *
+ * @commit_sem: synchronizes committer with other processes
+ * @cmt_state: commit state
+ * @cs_lock: commit state lock
+ * @cmt_wq: wait queue to sleep on if the log is full and a commit is running
+ * @fast_unmount: do not run journal commit before unmounting
+ * @big_lpt: flag that LPT is too big to write whole during commit
+ *
+ * @tnc_mutex: protects the Tree Node Cache (TNC), @zroot, @cnext, @enext, and
+ * @calc_idx_sz
+ * @zroot: zbranch which points to the root index node and znode
+ * @cnext: next znode to commit
+ * @enext: next znode to commit to empty space
+ * @gap_lebs: array of LEBs used by the in-gaps commit method
+ * @cbuf: commit buffer
+ * @ileb_buf: buffer for commit in-the-gaps method
+ * @ileb_len: length of data in ileb_buf
+ * @ihead_lnum: LEB number of index head
+ * @ihead_offs: offset of index head
+ * @ilebs: pre-allocated index LEBs
+ * @ileb_cnt: number of pre-allocated index LEBs
+ * @ileb_nxt: next pre-allocated index LEBs
+ * @old_idx: tree of index nodes obsoleted since the last commit start
+ * @new_ihead_lnum: used by debugging to check ihead_lnum
+ * @new_ihead_offs: used by debugging to check ihead_offs
+ *
+ * @mst_node: master node
+ * @mst_offs: offset of valid master node
+ * @mst_mutex: protects the master node area, @mst_node, and @mst_offs
+ *
+ * @log_lebs: number of logical eraseblocks in the log
+ * @log_bytes: log size in bytes
+ * @log_last: last LEB of the log
+ * @lpt_lebs: number of LEBs used for lprops table
+ * @lpt_first: first LEB of the lprops table area
+ * @lpt_last: last LEB of the lprops table area
+ * @orph_lebs: number of LEBs used for the orphan area
+ * @orph_first: first LEB of the orphan area
+ * @orph_last: last LEB of the orphan area
+ * @main_lebs: count of LEBs in the main area
+ * @main_first: first LEB of the main area
+ * @main_bytes: main area size in bytes
+ * @default_compr: default compression type
+ *
+ * @key_hash_type: type of the key hash
+ * @key_hash: direntry key hash function
+ * @key_fmt: key format
+ * @key_len: key length
+ * @fanout: fanout of the index tree (number of links per indexing node)
+ *
+ * @min_io_size: minimal input/output unit size
+ * @min_io_shift: number of bits in @min_io_size minus one
+ * @leb_size: logical eraseblock size in bytes
+ * @half_leb_size: half LEB size
+ * @leb_cnt: count of logical eraseblocks
+ * @max_leb_cnt: maximum count of logical eraseblocks
+ * @old_leb_cnt: count of logical eraseblocks before resize
+ * @ro_media: the underlying UBI volume is read-only
+ *
+ * @dirty_pg_cnt: number of dirty pages (not used)
+ * @dirty_ino_cnt: number of dirty inodes (not used)
+ * @dirty_zn_cnt: number of dirty znodes
+ * @clean_zn_cnt: number of clean znodes
+ *
+ * @budg_idx_growth: amount of bytes budgeted for index growth
+ * @budg_data_growth: amount of bytes budgeted for cached data
+ * @budg_dd_growth: amount of bytes budgeted for cached data that will make
+ * other data dirty
+ * @budg_uncommitted_idx: amount of bytes were budgeted for growth of the index,
+ * but which still have to be taken into account because
+ * the index has not been committed so far
+ * @space_lock: protects @budg_idx_growth, @budg_data_growth, @budg_dd_growth,
+ * @budg_uncommited_idx, @min_idx_lebs, @old_idx_sz, and @lst;
+ * @min_idx_lebs: minimum number of LEBs required for the index
+ * @old_idx_sz: size of index on flash
+ * @calc_idx_sz: temporary variable which is used to calculate new index size
+ * (contains accurate new index size at end of TNC commit start)
+ * @lst: lprops statistics
+ *
+ * @page_budget: budget for a page
+ * @inode_budget: budget for an inode
+ * @dent_budget: budget for a directory entry
+ *
+ * @ref_node_alsz: size of the LEB reference node aligned to the min. flash
+ * I/O unit
+ * @mst_node_alsz: master node aligned size
+ * @min_idx_node_sz: minimum indexing node aligned on 8-bytes boundary
+ * @max_idx_node_sz: maximum indexing node aligned on 8-bytes boundary
+ * @max_inode_sz: maximum possible inode size in bytes
+ * @max_znode_sz: size of znode in bytes
+ * @dead_wm: LEB dead space watermark
+ * @dark_wm: LEB dark space watermark
+ * @block_cnt: count of 4KiB blocks on the FS
+ *
+ * @ranges: UBIFS node length ranges
+ * @ubi: UBI volume descriptor
+ * @di: UBI device information
+ * @vi: UBI volume information
+ *
+ * @orph_tree: rb-tree of orphan inode numbers
+ * @orph_list: list of orphan inode numbers in order added
+ * @orph_new: list of orphan inode numbers added since last commit
+ * @orph_cnext: next orphan to commit
+ * @orph_dnext: next orphan to delete
+ * @orphan_lock: lock for orph_tree and orph_new
+ * @orph_buf: buffer for orphan nodes
+ * @new_orphans: number of orphans since last commit
+ * @cmt_orphans: number of orphans being committed
+ * @tot_orphans: number of orphans in the rb_tree
+ * @max_orphans: maximum number of orphans allowed
+ * @ohead_lnum: orphan head LEB number
+ * @ohead_offs: orphan head offset
+ * @no_orphs: non-zero if there are no orphans
+ *
+ * @bgt: UBIFS background thread
+ * @bgt_name: background thread name
+ * @need_bgt: if background thread should run
+ * @need_wbuf_sync: if write-buffers have to be synchronized
+ *
+ * @gc_lnum: LEB number used for garbage collection
+ * @sbuf: a buffer of LEB size used by GC and replay for scanning
+ * @idx_gc: list of index LEBs that have been garbage collected
+ * @idx_gc_cnt: number of elements on the idx_gc list
+ *
+ * @infos_list: links all 'ubifs_info' objects
+ * @umount_mutex: serializes shrinker and un-mount
+ * @shrinker_run_no: shrinker run number
+ *
+ * @space_bits: number of bits needed to record free or dirty space
+ * @lpt_lnum_bits: number of bits needed to record a LEB number in the LPT
+ * @lpt_offs_bits: number of bits needed to record an offset in the LPT
+ * @lpt_spc_bits: number of bits needed to space in the LPT
+ * @pcnt_bits: number of bits needed to record pnode or nnode number
+ * @lnum_bits: number of bits needed to record LEB number
+ * @nnode_sz: size of on-flash nnode
+ * @pnode_sz: size of on-flash pnode
+ * @ltab_sz: size of on-flash LPT lprops table
+ * @lsave_sz: size of on-flash LPT save table
+ * @pnode_cnt: number of pnodes
+ * @nnode_cnt: number of nnodes
+ * @lpt_hght: height of the LPT
+ * @pnodes_have: number of pnodes in memory
+ *
+ * @lp_mutex: protects lprops table and all the other lprops-related fields
+ * @lpt_lnum: LEB number of the root nnode of the LPT
+ * @lpt_offs: offset of the root nnode of the LPT
+ * @nhead_lnum: LEB number of LPT head
+ * @nhead_offs: offset of LPT head
+ * @lpt_drty_flgs: dirty flags for LPT special nodes e.g. ltab
+ * @dirty_nn_cnt: number of dirty nnodes
+ * @dirty_pn_cnt: number of dirty pnodes
+ * @lpt_sz: LPT size
+ * @lpt_nod_buf: buffer for an on-flash nnode or pnode
+ * @lpt_buf: buffer of LEB size used by LPT
+ * @nroot: address in memory of the root nnode of the LPT
+ * @lpt_cnext: next LPT node to commit
+ * @lpt_heap: array of heaps of categorized lprops
+ * @dirty_idx: a (reverse sorted) copy of the LPROPS_DIRTY_IDX heap as at
+ * previous commit start
+ * @uncat_list: list of un-categorized LEBs
+ * @empty_list: list of empty LEBs
+ * @freeable_list: list of freeable non-index LEBs (free + dirty == leb_size)
+ * @frdi_idx_list: list of freeable index LEBs (free + dirty == leb_size)
+ * @freeable_cnt: number of freeable LEBs in @freeable_list
+ *
+ * @ltab_lnum: LEB number of LPT's own lprops table
+ * @ltab_offs: offset of LPT's own lprops table
+ * @ltab: LPT's own lprops table
+ * @ltab_cmt: LPT's own lprops table (commit copy)
+ * @lsave_cnt: number of LEB numbers in LPT's save table
+ * @lsave_lnum: LEB number of LPT's save table
+ * @lsave_offs: offset of LPT's save table
+ * @lsave: LPT's save table
+ * @lscan_lnum: LEB number of last LPT scan
+ *
+ * @rp_size: size of the reserved pool in bytes
+ * @rp_uid: reserved pool user ID
+ * @rp_gid: reserved pool group ID
+ *
+ * @empty: if the UBI device is empty
+ * @replay_tree: temporary tree used during journal replay
+ * @replay_list: temporary list used during journal replay
+ * @replay_buds: list of buds to replay
+ * @cs_sqnum: sequence number of first node in the log (commit start node)
+ * @need_recovery: file-system needs recovery
+ * @replaying: set to %1 during journal replay
+ * @unclean_leb_list: LEBs to recover when mounting ro to rw
+ * @rcvrd_mst_node: recovered master node to write when mounting ro to rw
+ * @size_tree: inode size information for recovery
+ * @recovery_needs_commit: a commit must be done before unmounting
+ * @remounting_rw: set while remounting from ro to rw (sb flags have MS_RDONLY)
+ * @mount_opts: UBIFS-specific mount options
+ *
+ * @dbg_buf: a buffer of LEB size used for debugging purposes
+ * @old_zroot: old index root - used by 'dbg_check_old_index()'
+ * @old_zroot_level: old index root level - used by 'dbg_check_old_index()'
+ * @old_zroot_sqnum: old index root sqnum - used by 'dbg_check_old_index()'
+ * @failure_mode: failure mode for recovery testing
+ */
+struct ubifs_info {
+ struct super_block *vfs_sb;
+
+ ino_t highest_inum;
+ unsigned int vfs_gen;
+ unsigned long long max_sqnum;
+ unsigned long long cmt_no;
+ spinlock_t cnt_lock;
+ int fmt_vers;
+
+ int lhead_lnum;
+ int lhead_offs;
+ int ltail_lnum;
+ struct mutex log_mutex;
+ int min_log_bytes;
+ long long cmt_bud_bytes;
+
+ struct rb_root buds;
+ long long bud_bytes;
+ spinlock_t buds_lock;
+ int jhead_cnt;
+ struct ubifs_jhead *jheads;
+ long long max_bud_bytes;
+ long long bg_bud_bytes;
+ struct list_head old_buds;
+ int max_bud_cnt;
+
+ struct rw_semaphore commit_sem;
+ int cmt_state;
+ spinlock_t cs_lock;
+ wait_queue_head_t cmt_wq;
+ unsigned int fast_unmount:1;
+ unsigned int big_lpt:1;
+
+ struct mutex tnc_mutex;
+ struct ubifs_zbranch zroot;
+ struct ubifs_znode *cnext;
+ struct ubifs_znode *enext;
+ int *gap_lebs;
+ void *cbuf;
+ void *ileb_buf;
+ int ileb_len;
+ int ihead_lnum;
+ int ihead_offs;
+ int *ilebs;
+ int ileb_cnt;
+ int ileb_nxt;
+ struct rb_root old_idx;
+#ifdef CONFIG_UBIFS_FS_DEBUG
+ int new_ihead_lnum;
+ int new_ihead_offs;
+#endif
+
+ struct ubifs_mst_node *mst_node;
+ int mst_offs;
+ struct mutex mst_mutex;
+
+ int log_lebs;
+ long long log_bytes;
+ int log_last;
+ int lpt_lebs;
+ int lpt_first;
+ int lpt_last;
+ int orph_lebs;
+ int orph_first;
+ int orph_last;
+ int main_lebs;
+ int main_first;
+ long long main_bytes;
+ int default_compr;
+
+ uint8_t key_hash_type;
+ uint32_t (*key_hash)(const char *str, int len);
+ int key_fmt;
+ int key_len;
+ int fanout;
+
+ int min_io_size;
+ int min_io_shift;
+ int leb_size;
+ int half_leb_size;
+ int leb_cnt;
+ int max_leb_cnt;
+ int old_leb_cnt;
+ int ro_media;
+
+ atomic_long_t dirty_pg_cnt;
+ atomic_long_t dirty_ino_cnt;
+ atomic_long_t dirty_zn_cnt;
+ atomic_long_t clean_zn_cnt;
+
+ long long budg_idx_growth;
+ long long budg_data_growth;
+ long long budg_dd_growth;
+ long long budg_uncommitted_idx;
+ spinlock_t space_lock;
+ int min_idx_lebs;
+ unsigned long long old_idx_sz;
+ unsigned long long calc_idx_sz;
+ struct ubifs_lp_stats lst;
+
+ int page_budget;
+ int inode_budget;
+ int dent_budget;
+
+ int ref_node_alsz;
+ int mst_node_alsz;
+ int min_idx_node_sz;
+ int max_idx_node_sz;
+ long long max_inode_sz;
+ int max_znode_sz;
+ int dead_wm;
+ int dark_wm;
+ int block_cnt;
+
+ struct ubifs_node_range ranges[UBIFS_NODE_TYPES_CNT];
+ struct ubi_volume_desc *ubi;
+ struct ubi_device_info di;
+ struct ubi_volume_info vi;
+
+ struct rb_root orph_tree;
+ struct list_head orph_list;
+ struct list_head orph_new;
+ struct ubifs_orphan *orph_cnext;
+ struct ubifs_orphan *orph_dnext;
+ spinlock_t orphan_lock;
+ void *orph_buf;
+ int new_orphans;
+ int cmt_orphans;
+ int tot_orphans;
+ int max_orphans;
+ int ohead_lnum;
+ int ohead_offs;
+ int no_orphs;
+
+ struct task_struct *bgt;
+ char bgt_name[sizeof(SYNCER_BG_NAME) + 18];
+ int need_bgt;
+ int need_wbuf_sync;
+
+ int gc_lnum;
+ void *sbuf;
+ struct list_head idx_gc;
+ int idx_gc_cnt;
+
+ struct list_head infos_list;
+ struct mutex umount_mutex;
+ unsigned int shrinker_run_no;
+
+ int space_bits;
+ int lpt_lnum_bits;
+ int lpt_offs_bits;
+ int lpt_spc_bits;
+ int pcnt_bits;
+ int lnum_bits;
+ int nnode_sz;
+ int pnode_sz;
+ int ltab_sz;
+ int lsave_sz;
+ int pnode_cnt;
+ int nnode_cnt;
+ int lpt_hght;
+ int pnodes_have;
+
+ struct mutex lp_mutex;
+ int lpt_lnum;
+ int lpt_offs;
+ int nhead_lnum;
+ int nhead_offs;
+ int lpt_drty_flgs;
+ int dirty_nn_cnt;
+ int dirty_pn_cnt;
+ long long lpt_sz;
+ void *lpt_nod_buf;
+ void *lpt_buf;
+ struct ubifs_nnode *nroot;
+ struct ubifs_cnode *lpt_cnext;
+ struct ubifs_lpt_heap lpt_heap[LPROPS_HEAP_CNT];
+ struct ubifs_lpt_heap dirty_idx;
+ struct list_head uncat_list;
+ struct list_head empty_list;
+ struct list_head freeable_list;
+ struct list_head frdi_idx_list;
+ int freeable_cnt;
+
+ int ltab_lnum;
+ int ltab_offs;
+ struct ubifs_lpt_lprops *ltab;
+ struct ubifs_lpt_lprops *ltab_cmt;
+ int lsave_cnt;
+ int lsave_lnum;
+ int lsave_offs;
+ int *lsave;
+ int lscan_lnum;
+
+ long long rp_size;
+ uid_t rp_uid;
+ gid_t rp_gid;
+
+ /* The below fields are used only during mounting and re-mounting */
+ int empty;
+ struct rb_root replay_tree;
+ struct list_head replay_list;
+ struct list_head replay_buds;
+ unsigned long long cs_sqnum;
+ int need_recovery;
+ int replaying;
+ struct list_head unclean_leb_list;
+ struct ubifs_mst_node *rcvrd_mst_node;
+ struct rb_root size_tree;
+ int recovery_needs_commit;
+ int remounting_rw;
+ struct ubifs_mount_opts mount_opts;
+
+#ifdef CONFIG_UBIFS_FS_DEBUG
+ void *dbg_buf;
+ struct ubifs_zbranch old_zroot;
+ int old_zroot_level;
+ unsigned long long old_zroot_sqnum;
+ int failure_mode;
+#endif
+};
+
+extern struct list_head ubifs_infos;
+extern spinlock_t ubifs_infos_lock;
+extern atomic_long_t ubifs_clean_zn_cnt;
+extern struct kmem_cache *ubifs_inode_slab;
+extern struct super_operations ubifs_super_operations;
+extern struct address_space_operations ubifs_file_address_operations;
+extern struct file_operations ubifs_file_operations;
+extern struct inode_operations ubifs_file_inode_operations;
+extern struct file_operations ubifs_dir_operations;
+extern struct inode_operations ubifs_dir_inode_operations;
+extern struct inode_operations ubifs_symlink_inode_operations;
+extern struct backing_dev_info ubifs_backing_dev_info;
+extern struct ubifs_compressor *ubifs_compressors[UBIFS_COMPR_TYPES_CNT];
+
+/* io.c */
+int ubifs_wbuf_write_nolock(struct ubifs_wbuf *wbuf, void *buf, int len);
+int ubifs_wbuf_seek_nolock(struct ubifs_wbuf *wbuf, int lnum, int offs,
+ int dtype);
+int ubifs_wbuf_init(struct ubifs_info *c, struct ubifs_wbuf *wbuf);
+int ubifs_read_node(const struct ubifs_info *c, void *buf, int type, int len,
+ int lnum, int offs);
+int ubifs_read_node_wbuf(struct ubifs_wbuf *wbuf, void *buf, int type, int len,
+ int lnum, int offs);
+int ubifs_write_node(struct ubifs_info *c, void *node, int len, int lnum,
+ int offs, int dtype);
+int ubifs_check_node(const struct ubifs_info *c, const void *buf, int lnum,
+ int offs, int quiet);
+void ubifs_prepare_node(struct ubifs_info *c, void *buf, int len, int pad);
+void ubifs_prep_grp_node(struct ubifs_info *c, void *node, int len, int last);
+int ubifs_io_init(struct ubifs_info *c);
+void ubifs_pad(const struct ubifs_info *c, void *buf, int pad);
+int ubifs_wbuf_sync_nolock(struct ubifs_wbuf *wbuf);
+int ubifs_bg_wbufs_sync(struct ubifs_info *c);
+void ubifs_wbuf_add_ino_nolock(struct ubifs_wbuf *wbuf, ino_t inum);
+int ubifs_sync_wbufs_by_inodes(struct ubifs_info *c,
+ struct inode * const *inodes, int count);
+
+/* scan.c */
+struct ubifs_scan_leb *ubifs_scan(const struct ubifs_info *c, int lnum,
+ int offs, void *sbuf);
+void ubifs_scan_destroy(struct ubifs_scan_leb *sleb);
+int ubifs_scan_a_node(const struct ubifs_info *c, void *buf, int len, int lnum,
+ int offs, int quiet);
+struct ubifs_scan_leb *ubifs_start_scan(const struct ubifs_info *c, int lnum,
+ int offs, void *sbuf);
+void ubifs_end_scan(const struct ubifs_info *c, struct ubifs_scan_leb *sleb,
+ int lnum, int offs);
+int ubifs_add_snod(const struct ubifs_info *c, struct ubifs_scan_leb *sleb,
+ void *buf, int offs);
+void ubifs_scanned_corruption(const struct ubifs_info *c, int lnum, int offs,
+ void *buf);
+
+/* log.c */
+void ubifs_add_bud(struct ubifs_info *c, struct ubifs_bud *bud);
+void ubifs_create_buds_lists(struct ubifs_info *c);
+int ubifs_add_bud_to_log(struct ubifs_info *c, int jhead, int lnum, int offs);
+struct ubifs_bud *ubifs_search_bud(struct ubifs_info *c, int lnum);
+int ubifs_log_start_commit(struct ubifs_info *c, int *ltail_lnum);
+int ubifs_log_end_commit(struct ubifs_info *c, int new_ltail_lnum);
+int ubifs_log_post_commit(struct ubifs_info *c, int old_ltail_lnum);
+int ubifs_consolidate_log(struct ubifs_info *c);
+
+/* journal.c */
+int ubifs_jrn_update(struct ubifs_info *c, const struct inode *dir,
+ const struct qstr *nm, const struct inode *inode,
+ int deletion, int sync, int xent);
+int ubifs_jrn_write_data(struct ubifs_info *c, const struct inode *inode,
+ const union ubifs_key *key, const void *buf, int len);
+int ubifs_jrn_write_inode(struct ubifs_info *c, const struct inode *inode,
+ int last_reference, int sync);
+int ubifs_jrn_rename(struct ubifs_info *c, const struct inode *old_dir,
+ const struct dentry *old_dentry,
+ const struct inode *new_dir,
+ const struct dentry *new_dentry, int sync);
+int ubifs_jrn_truncate(struct ubifs_info *c, ino_t inum,
+ loff_t old_size, loff_t new_size);
+int ubifs_jrn_delete_xattr(struct ubifs_info *c, const struct inode *host,
+ const struct inode *inode, const struct qstr *nm,
+ int sync);
+int ubifs_jrn_write_2_inodes(struct ubifs_info *c, const struct inode *inode1,
+ const struct inode *inode2, int sync);
+
+/* budget.c */
+int ubifs_budget_space(struct ubifs_info *c, struct ubifs_budget_req *req);
+void ubifs_release_budget(struct ubifs_info *c, struct ubifs_budget_req *req);
+int ubifs_budget_inode_op(struct ubifs_info *c, struct inode *inode,
+ struct ubifs_budget_req *req);
+void ubifs_release_ino_dirty(struct ubifs_info *c, struct inode *inode,
+ struct ubifs_budget_req *req);
+void ubifs_cancel_ino_op(struct ubifs_info *c, struct inode *inode,
+ struct ubifs_budget_req *req);
+int ubifs_budget_ino_cleaning(struct ubifs_info *c, struct inode *inode,
+ struct ubifs_budget_req *req);
+void ubifs_release_ino_clean(struct ubifs_info *c, struct inode *inode,
+ struct ubifs_budget_req *req);
+long long ubifs_budg_get_free_space(struct ubifs_info *c);
+int ubifs_calc_min_idx_lebs(struct ubifs_info *c);
+void ubifs_convert_page_budget(struct ubifs_info *c);
+void ubifs_release_new_page_budget(struct ubifs_info *c);
+long long ubifs_calc_available(const struct ubifs_info *c);
+
+/* find.c */
+int ubifs_find_free_space(struct ubifs_info *c, int min_space, int *free,
+ int squeeze);
+int ubifs_find_free_leb_for_idx(struct ubifs_info *c);
+int ubifs_find_dirty_leb(struct ubifs_info *c, struct ubifs_lprops *ret_lp,
+ int min_space, int pick_free);
+int ubifs_find_dirty_idx_leb(struct ubifs_info *c);
+int ubifs_save_dirty_idx_lnums(struct ubifs_info *c);
+
+/* tnc.c */
+int ubifs_tnc_lookup(struct ubifs_info *c, const union ubifs_key *key,
+ void *node);
+int ubifs_tnc_locate(struct ubifs_info *c, const union ubifs_key *key,
+ void *node, int *lnum, int *offs);
+int ubifs_tnc_lookup_nm(struct ubifs_info *c, const union ubifs_key *key,
+ void *node, const struct qstr *nm);
+int ubifs_tnc_add(struct ubifs_info *c, const union ubifs_key *key, int lnum,
+ int offs, int len);
+int ubifs_tnc_replace(struct ubifs_info *c, const union ubifs_key *key,
+ int old_lnum, int old_offs, int lnum, int offs, int len);
+int ubifs_tnc_add_nm(struct ubifs_info *c, const union ubifs_key *key,
+ int lnum, int offs, int len, const struct qstr *nm);
+int ubifs_tnc_remove(struct ubifs_info *c, const union ubifs_key *key);
+int ubifs_tnc_remove_nm(struct ubifs_info *c, const union ubifs_key *key,
+ const struct qstr *nm);
+int ubifs_tnc_remove_range(struct ubifs_info *c, union ubifs_key *from_key,
+ union ubifs_key *to_key);
+int ubifs_tnc_remove_ino(struct ubifs_info *c, ino_t inum);
+struct ubifs_dent_node *ubifs_tnc_next_ent(struct ubifs_info *c,
+ union ubifs_key *key,
+ const struct qstr *nm);
+void ubifs_tnc_close(struct ubifs_info *c);
+long ubifs_destroy_tnc_subtree(struct ubifs_znode *zr);
+int ubifs_tnc_has_node(struct ubifs_info *c, union ubifs_key *key, int level,
+ int lnum, int offs, int is_idx);
+int ubifs_dirty_idx_node(struct ubifs_info *c, union ubifs_key *key, int level,
+ int lnum, int offs);
+int ubifs_validate_entry(struct ubifs_info *c,
+ const struct ubifs_dent_node *dent);
+/* Shared by tnc.c for tnc_commit.c */
+void destroy_old_idx(struct ubifs_info *c);
+int is_idx_node_in_tnc(struct ubifs_info *c, union ubifs_key *key, int level,
+ int lnum, int offs);
+int insert_old_idx_znode(struct ubifs_info *c, struct ubifs_znode *znode);
+
+/* tnc_commit.c */
+int ubifs_tnc_start_commit(struct ubifs_info *c, struct ubifs_zbranch *zroot);
+int ubifs_tnc_end_commit(struct ubifs_info *c);
+
+/* shrinker.c */
+int ubifs_shrinker(int nr_to_scan, gfp_t gfp_mask);
+
+/* commit.c */
+int ubifs_bg_thread(void *info);
+void ubifs_commit_required(struct ubifs_info *c);
+void ubifs_request_bg_commit(struct ubifs_info *c);
+int ubifs_run_commit(struct ubifs_info *c);
+void ubifs_recovery_commit(struct ubifs_info *c);
+int ubifs_gc_should_commit(struct ubifs_info *c);
+void ubifs_wait_for_commit(struct ubifs_info *c);
+
+/* build.c */
+void ubifs_umount(struct ubifs_info *c);
+int ubifs_remount_rw(struct ubifs_info *c);
+void ubifs_remount_ro(struct ubifs_info *c);
+int ubifs_parse_options(struct ubifs_info *c, char *options, int is_remount);
+
+/* master.c */
+int ubifs_read_master(struct ubifs_info *c);
+int ubifs_write_master(struct ubifs_info *c);
+
+/* sb.c */
+int ubifs_read_superblock(struct ubifs_info *c);
+struct ubifs_sb_node *ubifs_read_sb_node(struct ubifs_info *c);
+int ubifs_write_sb_node(struct ubifs_info *c, struct ubifs_sb_node *sup);
+
+/* replay.c */
+int ubifs_replay_journal(struct ubifs_info *c);
+
+/* gc.c */
+int ubifs_garbage_collect(struct ubifs_info *c, int anyway);
+int ubifs_gc_start_commit(struct ubifs_info *c);
+int ubifs_gc_end_commit(struct ubifs_info *c);
+void ubifs_destroy_idx_gc(struct ubifs_info *c);
+int ubifs_get_idx_gc_leb(struct ubifs_info *c);
+
+/* orphan.c */
+int ubifs_add_orphan(struct ubifs_info *c, ino_t inum);
+void ubifs_delete_orphan(struct ubifs_info *c, ino_t inum);
+int ubifs_orphan_start_commit(struct ubifs_info *c);
+int ubifs_orphan_end_commit(struct ubifs_info *c);
+int ubifs_mount_orphans(struct ubifs_info *c, int unclean);
+
+/* lpt.c */
+int ubifs_calc_lpt_geom(struct ubifs_info *c);
+int ubifs_create_dflt_lpt(struct ubifs_info *c, int *main_lebs, int lpt_first,
+ int *lpt_lebs, int *big_lpt);
+int ubifs_lpt_init(struct ubifs_info *c, int rd, int wr);
+struct ubifs_lprops *ubifs_lpt_lookup(struct ubifs_info *c, int lnum);
+struct ubifs_lprops *ubifs_lpt_lookup_dirty(struct ubifs_info *c, int lnum);
+int ubifs_lpt_scan_nolock(struct ubifs_info *c, int start_lnum, int end_lnum,
+ ubifs_lpt_scan_callback scan_cb, void *data);
+
+/* Shared by lpt.c for lpt_commit.c */
+void ubifs_pack_lsave(struct ubifs_info *c, void *buf, int *lsave);
+void ubifs_pack_ltab(struct ubifs_info *c, void *buf,
+ struct ubifs_lpt_lprops *ltab);
+void ubifs_pack_pnode(struct ubifs_info *c, void *buf,
+ struct ubifs_pnode *pnode);
+void ubifs_pack_nnode(struct ubifs_info *c, void *buf,
+ struct ubifs_nnode *nnode);
+struct ubifs_pnode *ubifs_get_pnode(struct ubifs_info *c,
+ struct ubifs_nnode *parent, int iip);
+struct ubifs_nnode *ubifs_get_nnode(struct ubifs_info *c,
+ struct ubifs_nnode *parent, int iip);
+int ubifs_read_nnode(struct ubifs_info *c, struct ubifs_nnode *parent, int iip);
+void ubifs_add_lpt_dirt(struct ubifs_info *c, int lnum, int dirty);
+void ubifs_add_nnode_dirt(struct ubifs_info *c, struct ubifs_nnode *nnode);
+uint32_t ubifs_unpack_bits(uint8_t **addr, int *pos, int nrbits);
+struct ubifs_nnode *ubifs_first_nnode(struct ubifs_info *c, int *hght);
+
+/* lpt_commit.c */
+int ubifs_lpt_start_commit(struct ubifs_info *c);
+int ubifs_lpt_end_commit(struct ubifs_info *c);
+int ubifs_lpt_post_commit(struct ubifs_info *c);
+void ubifs_lpt_free(struct ubifs_info *c, int wr_only);
+
+/* lprops.c */
+void ubifs_get_lprops(struct ubifs_info *c);
+const struct ubifs_lprops *ubifs_change_lp(struct ubifs_info *c,
+ const struct ubifs_lprops *lp,
+ int free, int dirty, int flags,
+ int idx_gc_cnt);
+void ubifs_release_lprops(struct ubifs_info *c);
+void ubifs_get_lp_stats(struct ubifs_info *c, struct ubifs_lp_stats *stats);
+void ubifs_add_to_cat(struct ubifs_info *c, struct ubifs_lprops *lprops,
+ int cat);
+void ubifs_replace_cat(struct ubifs_info *c, struct ubifs_lprops *old_lprops,
+ struct ubifs_lprops *new_lprops);
+void ubifs_ensure_cat(struct ubifs_info *c, struct ubifs_lprops *lprops);
+int ubifs_categorize_lprops(const struct ubifs_info *c,
+ const struct ubifs_lprops *lprops);
+int ubifs_change_one_lp(struct ubifs_info *c, int lnum, int free, int dirty,
+ int flags_set, int flags_clean, int idx_gc_cnt);
+int ubifs_update_one_lp(struct ubifs_info *c, int lnum, int free, int dirty,
+ int flags_set, int flags_clean);
+int ubifs_read_one_lp(struct ubifs_info *c, int lnum, struct ubifs_lprops *lp);
+const struct ubifs_lprops *ubifs_fast_find_free(struct ubifs_info *c);
+const struct ubifs_lprops *ubifs_fast_find_empty(struct ubifs_info *c);
+const struct ubifs_lprops *ubifs_fast_find_freeable(struct ubifs_info *c);
+const struct ubifs_lprops *ubifs_fast_find_frdi_idx(struct ubifs_info *c);
+
+/* file.c */
+int ubifs_fsync(struct file *filp, struct dentry *dentry, int datasync);
+int ubifs_setattr(struct dentry *dentry, struct iattr *attr);
+
+/* dir.c */
+struct inode *ubifs_new_inode(struct ubifs_info *c, const struct inode *dir,
+ int mode);
+int ubifs_getattr(struct vfsmount *mnt, struct dentry *dentry,
+ struct kstat *stat);
+
+/* xattr.c */
+int ubifs_setxattr(struct dentry *dentry, const char *name,
+ const void *value, size_t size, int flags);
+ssize_t ubifs_getxattr(struct dentry *dentry, const char *name, void *buf,
+ size_t size);
+ssize_t ubifs_listxattr(struct dentry *dentry, char *buffer, size_t size);
+int ubifs_removexattr(struct dentry *dentry, const char *name);
+
+/* super.c */
+struct inode *ubifs_iget(struct super_block *sb, unsigned long inum);
+
+/* recovery.c */
+int ubifs_recover_master_node(struct ubifs_info *c);
+int ubifs_write_rcvrd_mst_node(struct ubifs_info *c);
+struct ubifs_scan_leb *ubifs_recover_leb(struct ubifs_info *c, int lnum,
+ int offs, void *sbuf, int grouped);
+struct ubifs_scan_leb *ubifs_recover_log_leb(struct ubifs_info *c, int lnum,
+ int offs, void *sbuf);
+int ubifs_recover_inl_heads(const struct ubifs_info *c, void *sbuf);
+int ubifs_clean_lebs(const struct ubifs_info *c, void *sbuf);
+int ubifs_recover_gc_lnum(struct ubifs_info *c);
+int ubifs_recover_size_accum(struct ubifs_info *c, union ubifs_key *key,
+ int deletion, loff_t new_size);
+int ubifs_recover_size(struct ubifs_info *c);
+void ubifs_destroy_size_tree(struct ubifs_info *c);
+
+/* ioctl.c */
+int ubifs_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
+ unsigned long arg);
+#ifdef CONFIG_COMPAT
+long ubifs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+#endif
+void ubifs_set_inode_flags(struct inode *inode);
+
+/* compressor.c */
+int __init ubifs_compressors_init(void);
+void __exit ubifs_compressors_exit(void);
+void ubifs_compress(const void *in_buf, int in_len, void *out_buf, int *out_len,
+ int *compr_type);
+int ubifs_decompress(const void *buf, int len, void *out, int *out_len,
+ int compr_type);
+
+#include "debug.h"
+#include "misc.h"
+#include "key.h"
+
+#endif /* !__UBIFS_H__ */
--
1.5.4.1

2008-03-27 13:11:01

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 25/26] UBIFS: add debugging stuff

The UBIFS code is large, and we have a plenty of debugging stuff
in there which helps to catch bugs. Some of the debugging stuff
will be deleted later.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/debug.c | 1125 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ubifs/debug.h | 343 +++++++++++++++++
2 files changed, 1468 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/debug.c b/fs/ubifs/debug.c
new file mode 100644
index 0000000..5ccb5a4
--- /dev/null
+++ b/fs/ubifs/debug.c
@@ -0,0 +1,1125 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file implements most of the debugging stuff which is compiled in only
+ * when it is enabled. But some debugging check functions are implemented in
+ * corresponding subsystem, just because they are closely related and utilize
+ * various local functions of those subsystems.
+ */
+
+#define UBIFS_DBG_PRESERVE_KMALLOC
+#define UBIFS_DBG_PRESERVE_UBI
+
+#include "ubifs.h"
+
+#ifdef CONFIG_UBIFS_FS_DEBUG
+
+DEFINE_SPINLOCK(dbg_lock);
+
+static char dbg_get_key_dump_dump_buf[100];
+
+static size_t km_alloc_cnt;
+static size_t vm_alloc_cnt;
+
+static const char *get_key_fmt(int fmt)
+{
+ switch (fmt) {
+ case UBIFS_SIMPLE_KEY_FMT:
+ return "simple";
+ default:
+ return "unknown/invalid format";
+ }
+}
+
+static const char *get_key_hash(int hash)
+{
+ switch (hash) {
+ case UBIFS_KEY_HASH_R5:
+ return "R5";
+ case UBIFS_KEY_HASH_TEST:
+ return "test";
+ default:
+ return "unknown/invalid name hash";
+ }
+}
+
+static const char *get_key_type(int type)
+{
+ switch (type) {
+ case UBIFS_INO_KEY:
+ return "inode";
+ case UBIFS_DENT_KEY:
+ return "direntry";
+ case UBIFS_XENT_KEY:
+ return "xentry";
+ case UBIFS_DATA_KEY:
+ return "data";
+ case UBIFS_TRUN_KEY:
+ return "truncate";
+ default:
+ return "unknown/invalid key";
+ }
+}
+
+const char *dbg_get_key_dump(const struct ubifs_info *c,
+ const union ubifs_key *key)
+{
+ char *p = &dbg_get_key_dump_dump_buf[0];
+ int type = key_type(c, key);
+
+ if (c->key_fmt == UBIFS_SIMPLE_KEY_FMT) {
+ switch (type) {
+ case UBIFS_INO_KEY:
+ sprintf(p, "(%lu, %s)", key_ino(c, key),
+ get_key_type(type));
+ break;
+ case UBIFS_DENT_KEY:
+ case UBIFS_XENT_KEY:
+ sprintf(p, "(%lu, %s, %#08x)", key_ino(c, key),
+ get_key_type(type), key_hash(c, key));
+ break;
+ case UBIFS_DATA_KEY:
+ sprintf(p, "(%lu, %s, %u)", key_ino(c, key),
+ get_key_type(type), key_block(c, key));
+ break;
+ case UBIFS_TRUN_KEY:
+ sprintf(p, "(%lu, %s)",
+ key_ino(c, key), get_key_type(type));
+ break;
+ default:
+ sprintf(p, "(bad key type: %#08x, %#08x)",
+ key->u32[0], key->u32[1]);
+ }
+ } else
+ sprintf(p, "bad key format %d", c->key_fmt);
+
+ return p;
+}
+
+const char *dbg_ntype(int type)
+{
+ switch (type) {
+ case UBIFS_PAD_NODE:
+ return "padding node";
+ case UBIFS_SB_NODE:
+ return "superblock node";
+ case UBIFS_MST_NODE:
+ return "master node";
+ case UBIFS_REF_NODE:
+ return "reference node";
+ case UBIFS_INO_NODE:
+ return "inode node";
+ case UBIFS_DENT_NODE:
+ return "direntry node";
+ case UBIFS_XENT_NODE:
+ return "xentry node";
+ case UBIFS_DATA_NODE:
+ return "data node";
+ case UBIFS_TRUN_NODE:
+ return "truncate node";
+ case UBIFS_IDX_NODE:
+ return "indexing node";
+ case UBIFS_CS_NODE:
+ return "commit start node";
+ case UBIFS_ORPH_NODE:
+ return "orphan node";
+ default:
+ return "unknown node";
+ }
+}
+
+static const char *dbg_gtype(int type)
+{
+ switch (type) {
+ case UBIFS_NO_NODE_GROUP:
+ return "no node group";
+ case UBIFS_IN_NODE_GROUP:
+ return "in node group";
+ case UBIFS_LAST_OF_NODE_GROUP:
+ return "last of node group";
+ default:
+ return "unknown";
+ }
+}
+
+const char *dbg_cstate(int cmt_state)
+{
+ switch (cmt_state) {
+ case COMMIT_RESTING:
+ return "commit resting";
+ case COMMIT_BACKGROUND:
+ return "background commit requested";
+ case COMMIT_REQUIRED:
+ return "commit required";
+ case COMMIT_RUNNING_BACKGROUND:
+ return "BACKGROUND commit running";
+ case COMMIT_RUNNING_REQUIRED:
+ return "commit running and required";
+ case COMMIT_BROKEN:
+ return "broken commit";
+ default:
+ return "unknown commit state";
+ }
+}
+
+static void dump_ch(const struct ubifs_ch *ch)
+{
+ printk(KERN_DEBUG "\tmagic %#x\n", le32_to_cpu(ch->magic));
+ printk(KERN_DEBUG "\tcrc %#x\n", le32_to_cpu(ch->crc));
+ printk(KERN_DEBUG "\tnode_type %d (%s)\n", ch->node_type,
+ dbg_ntype(ch->node_type));
+ printk(KERN_DEBUG "\tgroup_type %d (%s)\n", ch->group_type,
+ dbg_gtype(ch->group_type));
+ printk(KERN_DEBUG "\tsqnum %llu\n", le64_to_cpu(ch->sqnum));
+ printk(KERN_DEBUG "\tlen %u\n", le32_to_cpu(ch->len));
+}
+
+void dbg_dump_node(const struct ubifs_info *c, const void *node)
+{
+ int i, n;
+ union ubifs_key key;
+ const struct ubifs_ch *ch = node;
+
+ if (dbg_failure_mode)
+ return;
+
+ /* If the magic is incorrect, just hexdump the first bytes */
+ if (le32_to_cpu(ch->magic) != UBIFS_NODE_MAGIC) {
+ printk(KERN_DEBUG "Not a node, first %zu bytes:", UBIFS_CH_SZ);
+ print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_OFFSET, 32, 1,
+ (void *)node, UBIFS_CH_SZ, 1);
+ return;
+ }
+
+ spin_lock(&dbg_lock);
+ dump_ch(node);
+
+ switch (ch->node_type) {
+ case UBIFS_PAD_NODE:
+ {
+ const struct ubifs_pad_node *pad = node;
+
+ printk(KERN_DEBUG "\tpad_len %u\n",
+ le32_to_cpu(pad->pad_len));
+ break;
+ }
+ case UBIFS_SB_NODE:
+ {
+ const struct ubifs_sb_node *sup = node;
+ unsigned int sup_flags = le32_to_cpu(sup->flags);
+
+ printk(KERN_DEBUG "\tkey_hash %d (%s)\n",
+ (int)sup->key_hash, get_key_hash(sup->key_hash));
+ printk(KERN_DEBUG "\tkey_fmt %d (%s)\n",
+ (int)sup->key_fmt, get_key_fmt(sup->key_fmt));
+ printk(KERN_DEBUG "\tflags %#x\n", sup_flags);
+ printk(KERN_DEBUG "\t\tbig_lpt %u\n",
+ !!(sup_flags & UBIFS_FLG_BIGLPT));
+ printk(KERN_DEBUG "\tmin_io_size %u\n",
+ le32_to_cpu(sup->min_io_size));
+ printk(KERN_DEBUG "\tleb_size %u\n",
+ le32_to_cpu(sup->leb_size));
+ printk(KERN_DEBUG "\tleb_cnt %u\n",
+ le32_to_cpu(sup->leb_cnt));
+ printk(KERN_DEBUG "\tmax_leb_cnt %u\n",
+ le32_to_cpu(sup->max_leb_cnt));
+ printk(KERN_DEBUG "\tmax_bud_bytes %llu\n",
+ le64_to_cpu(sup->max_bud_bytes));
+ printk(KERN_DEBUG "\tlog_lebs %u\n",
+ le32_to_cpu(sup->log_lebs));
+ printk(KERN_DEBUG "\tlpt_lebs %u\n",
+ le32_to_cpu(sup->lpt_lebs));
+ printk(KERN_DEBUG "\torph_lebs %u\n",
+ le32_to_cpu(sup->orph_lebs));
+ printk(KERN_DEBUG "\tjhead_cnt %u\n",
+ le32_to_cpu(sup->jhead_cnt));
+ printk(KERN_DEBUG "\tfanout %u\n",
+ le32_to_cpu(sup->fanout));
+ printk(KERN_DEBUG "\tlsave_cnt %u\n",
+ le32_to_cpu(sup->lsave_cnt));
+ printk(KERN_DEBUG "\tdefault_compr %u\n",
+ (int)le16_to_cpu(sup->default_compr));
+ printk(KERN_DEBUG "\trp_size %llu\n",
+ le64_to_cpu(sup->rp_size));
+ printk(KERN_DEBUG "\trp_uid %u\n",
+ le32_to_cpu(sup->rp_uid));
+ printk(KERN_DEBUG "\trp_gid %u\n",
+ le32_to_cpu(sup->rp_gid));
+ break;
+ }
+ case UBIFS_MST_NODE:
+ {
+ const struct ubifs_mst_node *mst = node;
+
+ printk(KERN_DEBUG "\thighest_inum %llu\n",
+ le64_to_cpu(mst->highest_inum));
+ printk(KERN_DEBUG "\tcommit number %llu\n",
+ le64_to_cpu(mst->cmt_no));
+ printk(KERN_DEBUG "\tflags %#x\n",
+ le32_to_cpu(mst->flags));
+ printk(KERN_DEBUG "\tlog_lnum %u\n",
+ le32_to_cpu(mst->log_lnum));
+ printk(KERN_DEBUG "\troot_lnum %u\n",
+ le32_to_cpu(mst->root_lnum));
+ printk(KERN_DEBUG "\troot_offs %u\n",
+ le32_to_cpu(mst->root_offs));
+ printk(KERN_DEBUG "\troot_len %u\n",
+ le32_to_cpu(mst->root_len));
+ printk(KERN_DEBUG "\tgc_lnum %u\n",
+ le32_to_cpu(mst->gc_lnum));
+ printk(KERN_DEBUG "\tihead_lnum %u\n",
+ le32_to_cpu(mst->ihead_lnum));
+ printk(KERN_DEBUG "\tihead_offs %u\n",
+ le32_to_cpu(mst->ihead_offs));
+ printk(KERN_DEBUG "\tindex_size %u\n",
+ le32_to_cpu(mst->index_size));
+ printk(KERN_DEBUG "\tlpt_lnum %u\n",
+ le32_to_cpu(mst->lpt_lnum));
+ printk(KERN_DEBUG "\tlpt_offs %u\n",
+ le32_to_cpu(mst->lpt_offs));
+ printk(KERN_DEBUG "\tnhead_lnum %u\n",
+ le32_to_cpu(mst->nhead_lnum));
+ printk(KERN_DEBUG "\tnhead_offs %u\n",
+ le32_to_cpu(mst->nhead_offs));
+ printk(KERN_DEBUG "\tltab_lnum %u\n",
+ le32_to_cpu(mst->ltab_lnum));
+ printk(KERN_DEBUG "\tltab_offs %u\n",
+ le32_to_cpu(mst->ltab_offs));
+ printk(KERN_DEBUG "\tlsave_lnum %u\n",
+ le32_to_cpu(mst->lsave_lnum));
+ printk(KERN_DEBUG "\tlsave_offs %u\n",
+ le32_to_cpu(mst->lsave_offs));
+ printk(KERN_DEBUG "\tlscan_lnum %u\n",
+ le32_to_cpu(mst->lscan_lnum));
+ printk(KERN_DEBUG "\tleb_cnt %u\n",
+ le32_to_cpu(mst->leb_cnt));
+ printk(KERN_DEBUG "\tempty_lebs %u\n",
+ le32_to_cpu(mst->empty_lebs));
+ printk(KERN_DEBUG "\tidx_lebs %u\n",
+ le32_to_cpu(mst->idx_lebs));
+ printk(KERN_DEBUG "\ttotal_free %llu\n",
+ le64_to_cpu(mst->total_free));
+ printk(KERN_DEBUG "\ttotal_dirty %llu\n",
+ le64_to_cpu(mst->total_dirty));
+ printk(KERN_DEBUG "\ttotal_used %llu\n",
+ le64_to_cpu(mst->total_used));
+ printk(KERN_DEBUG "\ttotal_dead %llu\n",
+ le64_to_cpu(mst->total_dead));
+ printk(KERN_DEBUG "\ttotal_dark %llu\n",
+ le64_to_cpu(mst->total_dark));
+ break;
+ }
+ case UBIFS_REF_NODE:
+ {
+ const struct ubifs_ref_node *ref = node;
+
+ printk(KERN_DEBUG "\tlnum %u\n",
+ le32_to_cpu(ref->lnum));
+ printk(KERN_DEBUG "\toffs %u\n",
+ le32_to_cpu(ref->offs));
+ printk(KERN_DEBUG "\tjhead %u\n",
+ le32_to_cpu(ref->jhead));
+ break;
+ }
+ case UBIFS_INO_NODE:
+ {
+ const struct ubifs_ino_node *ino = node;
+
+ key_read(c, &ino->key, &key);
+ printk(KERN_DEBUG "\tkey %s\n",
+ dbg_get_key_dump(c, &key));
+ printk(KERN_DEBUG "\tsize %llu\n",
+ le64_to_cpu(ino->size));
+ printk(KERN_DEBUG "\tnlink %u\n",
+ le32_to_cpu(ino->nlink));
+ printk(KERN_DEBUG "\tatime %u\n",
+ le32_to_cpu(ino->atime));
+ printk(KERN_DEBUG "\tctime %u\n",
+ le32_to_cpu(ino->ctime));
+ printk(KERN_DEBUG "\tmtime %u\n",
+ le32_to_cpu(ino->mtime));
+ printk(KERN_DEBUG "\tuid %u\n",
+ le32_to_cpu(ino->uid));
+ printk(KERN_DEBUG "\tgid %u\n",
+ le32_to_cpu(ino->gid));
+ printk(KERN_DEBUG "\tmode %u\n",
+ le32_to_cpu(ino->mode));
+ printk(KERN_DEBUG "\tflags %#x\n",
+ le32_to_cpu(ino->flags));
+ printk(KERN_DEBUG "\txattr_cnt %u\n",
+ le32_to_cpu(ino->xattr_cnt));
+ printk(KERN_DEBUG "\txattr_size %llu\n",
+ le64_to_cpu(ino->xattr_size));
+ printk(KERN_DEBUG "\txattr_msize %llu\n",
+ le64_to_cpu(ino->xattr_msize));
+ printk(KERN_DEBUG "\txattr_names %u\n",
+ le32_to_cpu(ino->xattr_names));
+ printk(KERN_DEBUG "\tcompr_type %#x\n",
+ (int)le16_to_cpu(ino->compr_type));
+ printk(KERN_DEBUG "\tdata len %u\n",
+ le32_to_cpu(ino->data_len));
+ break;
+ }
+ case UBIFS_DENT_NODE:
+ case UBIFS_XENT_NODE:
+ {
+ const struct ubifs_dent_node *dent = node;
+ int nlen = le16_to_cpu(dent->nlen);
+
+ key_read(c, &dent->key, &key);
+ printk(KERN_DEBUG "\tkey %s\n",
+ dbg_get_key_dump(c, &key));
+ printk(KERN_DEBUG "\tinum %llu\n",
+ le64_to_cpu(dent->inum));
+ printk(KERN_DEBUG "\ttype %d\n", (int)dent->type);
+ printk(KERN_DEBUG "\tnlen %d\n", nlen);
+ printk(KERN_DEBUG "\tname ");
+
+ if (nlen > UBIFS_MAX_NLEN) {
+ nlen = UBIFS_MAX_NLEN;
+ printk(KERN_DEBUG "\tWarning! Node is corrupted\n");
+ }
+
+ for (i = 0; i < nlen && dent->name[i]; i++)
+ printk("%c", dent->name[i]);
+ printk("\n");
+
+ break;
+ }
+ case UBIFS_DATA_NODE:
+ {
+ const struct ubifs_data_node *dn = node;
+ int dlen = le32_to_cpu(ch->len) - UBIFS_DATA_NODE_SZ;
+
+ key_read(c, &dn->key, &key);
+ printk(KERN_DEBUG "\tkey %s\n",
+ dbg_get_key_dump(c, &key));
+ printk(KERN_DEBUG "\tsize %u\n",
+ le32_to_cpu(dn->size));
+ printk(KERN_DEBUG "\tcompr_typ %d\n",
+ (int)le16_to_cpu(dn->compr_type));
+ printk(KERN_DEBUG "\tdata size %d\n",
+ dlen);
+ printk(KERN_DEBUG "\tdata:\n");
+ print_hex_dump(KERN_DEBUG, "\t", DUMP_PREFIX_OFFSET, 32, 1,
+ (void *)&dn->data, dlen, 0);
+ break;
+ }
+ case UBIFS_TRUN_NODE:
+ {
+ const struct ubifs_trun_node *trun = node;
+
+ key_read(c, &trun->key, &key);
+ printk(KERN_DEBUG "\tkey %s\n",
+ dbg_get_key_dump(c, &key));
+ printk(KERN_DEBUG "\told_size %llu\n",
+ le64_to_cpu(trun->old_size));
+ printk(KERN_DEBUG "\tnew_size %llu\n",
+ le64_to_cpu(trun->new_size));
+ break;
+ }
+ case UBIFS_IDX_NODE:
+ {
+ const struct ubifs_idx_node *idx = node;
+
+ n = le16_to_cpu(idx->child_cnt);
+ printk(KERN_DEBUG "\tchild_cnt %d\n", n);
+ printk(KERN_DEBUG "\tlevel %d\n",
+ (int)le16_to_cpu(idx->level));
+ printk(KERN_DEBUG "Branches:\n");
+
+ for (i = 0; i < n && i < c->fanout - 1; i++) {
+ const struct ubifs_branch *br;
+
+ br = ubifs_idx_branch(c, idx, i);
+ key_read(c, &br->key, &key);
+ printk(KERN_DEBUG "\t %04d: key %s",
+ i, dbg_get_key_dump(c, &key));
+ printk(KERN_DEBUG "\t lnum %6d, offs %6d, "
+ "len %6d\n", le32_to_cpu(br->lnum),
+ le32_to_cpu(br->offs), le32_to_cpu(br->len));
+ }
+ break;
+ }
+ case UBIFS_CS_NODE:
+ break;
+ case UBIFS_ORPH_NODE:
+ {
+ const struct ubifs_orph_node *orph = node;
+
+ printk(KERN_DEBUG "\tcommit number %llu\n",
+ le64_to_cpu(orph->cmt_no) & LLONG_MAX);
+ printk(KERN_DEBUG "\tlast node flag %llu\n",
+ le64_to_cpu(orph->cmt_no) >> 63);
+ n = (le32_to_cpu(ch->len) - UBIFS_ORPH_NODE_SZ) >> 3;
+ printk(KERN_DEBUG "\t%d orphan inode numbers:\n", n);
+ for (i = 0; i < n; i++)
+ printk(KERN_DEBUG "\t ino %llu\n",
+ le64_to_cpu(orph->inos[i]));
+ break;
+ }
+ default:
+ printk(KERN_DEBUG "node type %d was not recognized\n",
+ (int)ch->node_type);
+ }
+ spin_unlock(&dbg_lock);
+}
+
+void dbg_dump_budget_req(const struct ubifs_budget_req *req)
+{
+ spin_lock(&dbg_lock);
+ printk(KERN_DEBUG "Budgeting request: new_ino %d, dirtied_ino %d\n",
+ req->new_ino, req->dirtied_ino);
+ printk(KERN_DEBUG "\tnew_ino_d %d, dirtied_ino_d %d\n",
+ req->new_ino_d, req->dirtied_ino_d);
+ printk(KERN_DEBUG "\tnew_page %d, dirtied_page %d\n",
+ req->new_page, req->dirtied_page);
+ printk(KERN_DEBUG "\tnew_dent %d, mod_dent %d\n",
+ req->new_dent, req->mod_dent);
+ printk(KERN_DEBUG "\tidx_growth %d\n", req->idx_growth);
+ printk(KERN_DEBUG "\tdata_growth %d dd_growth %d\n",
+ req->data_growth, req->dd_growth);
+ spin_unlock(&dbg_lock);
+}
+
+void dbg_dump_lstats(const struct ubifs_lp_stats *lst)
+{
+ spin_lock(&dbg_lock);
+ printk(KERN_DEBUG "Lprops statistics: empty_lebs %d, idx_lebs %d\n",
+ lst->empty_lebs, lst->idx_lebs);
+ printk(KERN_DEBUG "\ttaken_empty_lebs %d, total_free %lld, "
+ "total_dirty %lld\n", lst->taken_empty_lebs, lst->total_free,
+ lst->total_dirty);
+ printk(KERN_DEBUG "\ttotal_used %lld, total_dark %lld, "
+ "total_dead %lld\n", lst->total_used, lst->total_dark,
+ lst->total_dead);
+ spin_unlock(&dbg_lock);
+}
+
+void dbg_dump_budg(struct ubifs_info *c)
+{
+ int i;
+ struct rb_node *rb;
+ struct ubifs_bud *bud;
+ struct ubifs_gced_idx_leb *idx_gc;
+
+ spin_lock(&dbg_lock);
+ printk(KERN_DEBUG "Budgeting info: budg_data_growth %lld, "
+ "budg_dd_growth %lld, budg_idx_growth %lld\n",
+ c->budg_data_growth, c->budg_dd_growth, c->budg_idx_growth);
+ printk(KERN_DEBUG "\tdata budget sum %lld, total budget sum %lld, "
+ "freeable_cnt %d\n", c->budg_data_growth + c->budg_dd_growth,
+ c->budg_data_growth + c->budg_dd_growth + c->budg_idx_growth,
+ c->freeable_cnt);
+ printk(KERN_DEBUG "\tmin_idx_lebs %d, old_idx_sz %lld, "
+ "calc_idx_sz %lld, idx_gc_cnt %d\n", c->min_idx_lebs,
+ c->old_idx_sz, c->calc_idx_sz, c->idx_gc_cnt);
+ printk(KERN_DEBUG "\tdirty_pg_cnt %ld, dirty_ino_cnt %ld, "
+ "dirty_zn_cnt %ld, clean_zn_cnt %ld\n",
+ atomic_long_read(&c->dirty_pg_cnt),
+ atomic_long_read(&c->dirty_ino_cnt),
+ atomic_long_read(&c->dirty_zn_cnt),
+ atomic_long_read(&c->clean_zn_cnt));
+ printk(KERN_DEBUG "\tdark_wm %d, dead_wm %d, max_idx_node_sz %d\n",
+ c->dark_wm, c->dead_wm, c->max_idx_node_sz);
+ printk(KERN_DEBUG "\tgc_lnum %d, ihead_lnum %d\n",
+ c->gc_lnum, c->ihead_lnum);
+ for (i = 0; i < c->jhead_cnt; i++)
+ printk(KERN_DEBUG "\tjhead %d\t LEB %d\n",
+ c->jheads[i].wbuf.jhead, c->jheads[i].wbuf.lnum);
+ for (rb = rb_first(&c->buds); rb; rb = rb_next(rb)) {
+ bud = rb_entry(rb, struct ubifs_bud, rb);
+ printk(KERN_DEBUG "\tbud LEB %d\n", bud->lnum);
+ }
+ list_for_each_entry(bud, &c->old_buds, list)
+ printk(KERN_DEBUG "\told bud LEB %d\n", bud->lnum);
+ list_for_each_entry(idx_gc, &c->idx_gc, list)
+ printk(KERN_DEBUG "\tGC'ed idx LEB %d unmap %d\n",
+ idx_gc->lnum, idx_gc->unmap);
+ printk(KERN_DEBUG "\tcommit state %d\n", c->cmt_state);
+ spin_unlock(&dbg_lock);
+}
+
+void dbg_dump_lprop(const struct ubifs_info *c, const struct ubifs_lprops *lp)
+{
+ printk(KERN_DEBUG "LEB %d lprops: free %d, dirty %d (used %d), "
+ "flags %#x\n", lp->lnum, lp->free, lp->dirty,
+ c->leb_size - lp->free - lp->dirty, lp->flags);
+}
+
+void dbg_dump_lprops(struct ubifs_info *c)
+{
+ int lnum, err;
+ struct ubifs_lprops lp;
+ struct ubifs_lp_stats lst;
+
+ printk(KERN_DEBUG "Dumping LEB properties\n");
+ ubifs_get_lp_stats(c, &lst);
+ dbg_dump_lstats(&lst);
+
+ for (lnum = c->main_first; lnum < c->leb_cnt; lnum++) {
+ err = ubifs_read_one_lp(c, lnum, &lp);
+ if (err)
+ ubifs_err("cannot read lprops for LEB %d", lnum);
+
+ dbg_dump_lprop(c, &lp);
+ }
+}
+
+void dbg_dump_leb(const struct ubifs_info *c, int lnum)
+{
+ struct ubifs_scan_leb *sleb;
+ struct ubifs_scan_node *snod;
+
+ if (dbg_failure_mode)
+ return;
+
+ printk(KERN_DEBUG "Dumping LEB %d\n", lnum);
+
+ sleb = ubifs_scan(c, lnum, 0, c->dbg_buf);
+ if (IS_ERR(sleb)) {
+ ubifs_err("scan error %d", (int)PTR_ERR(sleb));
+ return;
+ }
+
+ printk(KERN_DEBUG "LEB %d has %d nodes ending at %d\n", lnum,
+ sleb->nodes_cnt, sleb->endpt);
+
+ list_for_each_entry(snod, &sleb->nodes, list) {
+ cond_resched();
+ printk(KERN_DEBUG "Dumping node at LEB %d:%d len %d\n", lnum,
+ snod->offs, snod->len);
+ dbg_dump_node(c, snod->node);
+ }
+
+ ubifs_scan_destroy(sleb);
+ return;
+}
+
+void dbg_dump_znode(const struct ubifs_info *c, const struct ubifs_znode *znode)
+{
+ int n;
+
+ spin_lock(&dbg_lock);
+ printk(KERN_DEBUG "znode %p, parent %p iip %d level %d child_cnt %d "
+ "flags %lx\n", znode, znode->parent, znode->iip, znode->level,
+ znode->child_cnt, znode->flags);
+
+ if (znode->child_cnt <= 0 || znode->child_cnt > c->fanout) {
+ spin_unlock(&dbg_lock);
+ return;
+ }
+
+ printk(KERN_DEBUG "zbranches:\n");
+ for (n = 0; n < znode->child_cnt; n++) {
+ const struct ubifs_zbranch *zbr = &znode->zbranch[n];
+
+ cond_resched();
+ if (znode->level > 0)
+ printk(KERN_DEBUG "\t%d: znode %p lnum %d offs %d "
+ "len %d key %s\n", n, zbr->znode,
+ zbr->lnum, zbr->offs, zbr->len,
+ dbg_get_key_dump(c, &zbr->key));
+ else
+ printk(KERN_DEBUG "\t%d: LNC %p lnum %d offs %d "
+ "len %d key %s\n", n, zbr->znode,
+ zbr->lnum, zbr->offs, zbr->len,
+ dbg_get_key_dump(c, &zbr->key));
+ }
+ spin_unlock(&dbg_lock);
+}
+
+void dbg_dump_heap(struct ubifs_info *c, struct ubifs_lpt_heap *heap, int cat)
+{
+ int i;
+
+ printk(KERN_DEBUG "Dumping heap cat %d (%d elements)\n",
+ cat, heap->cnt);
+ for (i = 0; i < heap->cnt; i++) {
+ struct ubifs_lprops *lprops = heap->arr[i];
+
+ printk(KERN_DEBUG "\t%d. LEB %d hpos %d free %d dirty %d "
+ "flags %d\n", i, lprops->lnum, lprops->hpos,
+ lprops->free, lprops->dirty, lprops->flags);
+ }
+}
+
+void dbg_dump_pnode(struct ubifs_info *c, struct ubifs_pnode *pnode,
+ struct ubifs_nnode *parent, int iip)
+{
+ int i;
+
+ printk(KERN_DEBUG "Dumping pnode:\n");
+ printk(KERN_DEBUG "\taddress %zx parent %zx cnext %zx\n",
+ (size_t)pnode, (size_t)parent, (size_t)pnode->cnext);
+ printk(KERN_DEBUG "\tflags %lu iip %d level %d num %d\n",
+ pnode->flags, iip, pnode->level, pnode->num);
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ struct ubifs_lprops *lp = &pnode->lprops[i];
+
+ printk(KERN_DEBUG "\t%d: free %d dirty %d flags %d lnum %d\n",
+ i, lp->free, lp->dirty, lp->flags, lp->lnum);
+ }
+}
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_OTHER
+
+/*
+ * dbg_check_dir - check directory inode size.
+ * @c: UBIFS file-system description object
+ * @dir: the directory to calculate size for
+ * @size: the result is returned here
+ *
+ * This function makes sure that directory size is correct. Returns zero
+ * in case of success and a negative error code in case of failure.
+ *
+ * Note, it is good idea to make sure the @dir->i_mutex is locked before
+ * calling this function.
+ */
+int dbg_check_dir_size(struct ubifs_info *c, const struct inode *dir)
+{
+ union ubifs_key key;
+ struct ubifs_dent_node *dent, *pdent = NULL;
+ struct qstr nm = { .name = NULL };
+ loff_t size = 0;
+
+ if (!S_ISDIR(dir->i_mode))
+ return 0;
+
+ lowest_dent_key(c, &key, dir->i_ino);
+ while (1) {
+ int err;
+
+ dent = ubifs_tnc_next_ent(c, &key, &nm);
+ if (IS_ERR(dent)) {
+ err = PTR_ERR(dent);
+ if (err == -ENOENT)
+ break;
+ return err;
+ }
+
+ size += CALC_DENT_SIZE(dent->nlen);
+ nm.name = dent->name;
+ nm.len = le16_to_cpu(dent->nlen);
+ dbg_kfree(pdent); /* kfree via debug function */
+ pdent = dent;
+ key_read(c, &dent->key, &key);
+ }
+
+ dbg_kfree(pdent); /* kfree via debug function */
+
+ if (i_size_read(dir) != size) {
+ ubifs_err("bad directory dir %lu size %llu, "
+ "calculated %llu", dir->i_ino,
+ i_size_read(dir), size);
+ dump_stack();
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+#endif /* CONFIG_UBIFS_FS_DEBUG_CHK_OTHER */
+
+void *dbg_kmalloc(size_t size, gfp_t flags)
+{
+ void *addr;
+
+ addr = kmalloc(size, flags);
+ if (addr != NULL) {
+ spin_lock(&dbg_lock);
+ km_alloc_cnt += 1;
+ spin_unlock(&dbg_lock);
+ }
+ return addr;
+}
+
+void *dbg_kzalloc(size_t size, gfp_t flags)
+{
+ void *addr;
+
+ addr = kzalloc(size, flags);
+ if (addr != NULL) {
+ spin_lock(&dbg_lock);
+ km_alloc_cnt += 1;
+ spin_unlock(&dbg_lock);
+ }
+ return addr;
+}
+
+void dbg_kfree(const void *addr)
+{
+ if (addr != NULL) {
+ spin_lock(&dbg_lock);
+ km_alloc_cnt -= 1;
+ spin_unlock(&dbg_lock);
+ kfree(addr);
+ }
+}
+
+void *dbg_vmalloc(size_t size)
+{
+ void *addr;
+
+ addr = vmalloc(size);
+ if (addr != NULL) {
+ spin_lock(&dbg_lock);
+ vm_alloc_cnt += 1;
+ spin_unlock(&dbg_lock);
+ }
+ return addr;
+}
+
+void dbg_vfree(void *addr)
+{
+ if (addr != NULL) {
+ spin_lock(&dbg_lock);
+ vm_alloc_cnt -= 1;
+ spin_unlock(&dbg_lock);
+ vfree(addr);
+ }
+}
+
+void dbg_leak_report(void)
+{
+ spin_lock(&dbg_lock);
+ if (km_alloc_cnt || vm_alloc_cnt) {
+ ubifs_err("kmalloc: leak count %zd", km_alloc_cnt);
+ ubifs_err("vmalloc: leak count %zd", vm_alloc_cnt);
+ }
+ spin_unlock(&dbg_lock);
+}
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_MEMPRESS
+
+/*
+ * The below debugging stuff helps to make fake Linux memory pressure in order
+ * to make UBIFS shrinker be invoked. Useful for testing.
+ */
+
+/*
+ * struct eaten_memory - memory object eaten by UBIFS to cause memory pressure.
+ * @list: link in the list of eaten memory objects
+ * @pad: just pads to memory page size
+ */
+struct eaten_memory {
+ struct list_head list;
+ uint8_t pad[PAGE_CACHE_SIZE - sizeof(struct list_head)];
+};
+
+/* List of eaten memory pages */
+static LIST_HEAD(eaten_list);
+/* Count of allocated 'struct eaten_memory' objects */
+static unsigned long eaten_cnt;
+/* Protects 'eaten_list' and 'eaten_cnt' */
+static DEFINE_SPINLOCK(eaten_lock);
+
+void dbg_eat_memory(void)
+{
+ struct eaten_memory *em;
+
+ em = kmalloc(sizeof(struct eaten_memory), GFP_NOFS);
+ if (!em) {
+ ubifs_err("cannot allocate eaten memory structure");
+ return;
+ }
+
+ spin_lock(&eaten_lock);
+ list_add_tail(&em->list, &eaten_list);
+ eaten_cnt += 1;
+ spin_unlock(&eaten_lock);
+}
+
+static int return_eaten_memory(int nr)
+{
+ int free_all = 0, freed = 0;
+ struct eaten_memory *em;
+
+ if (nr == 0)
+ return eaten_cnt;
+
+ if (nr == -1)
+ free_all = 1;
+
+ while (nr > 0 || free_all) {
+ spin_lock(&eaten_lock);
+ if (eaten_cnt == 0) {
+ spin_unlock(&eaten_lock);
+ break;
+ }
+
+ em = list_entry(eaten_list.next, struct eaten_memory, list);
+ list_del(&em->list);
+ eaten_cnt -= 1;
+ spin_unlock(&eaten_lock);
+
+ kfree(em);
+ nr -= 1;
+ freed += 1;
+ }
+
+ return freed;
+}
+
+static int dbg_shrinker(int nr, gfp_t gfp_mask)
+{
+ return return_eaten_memory(nr);
+}
+
+static struct shrinker dbg_shrinker_info = {
+ .shrink = dbg_shrinker,
+ .seeks = DEFAULT_SEEKS,
+};
+
+void __init dbg_mempressure_init(void)
+{
+ register_shrinker(&dbg_shrinker_info);
+}
+
+void dbg_mempressure_exit(void)
+{
+ unregister_shrinker(&dbg_shrinker_info);
+ return_eaten_memory(-1);
+}
+
+#endif /* CONFIG_UBIFS_FS_DEBUG_CHK_MEMPRESS */
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_TEST_RCVRY
+
+#define chance(n, d) (simple_rand() <= (n) * 32768LL / (d))
+
+struct failure_mode_info {
+ struct list_head list;
+ struct ubifs_info *c;
+};
+
+static LIST_HEAD(fmi_list);
+static DEFINE_SPINLOCK(fmi_lock);
+
+static unsigned int next;
+
+static int simple_rand(void)
+{
+ if (next == 0)
+ next = current->pid;
+ next = next * 1103515245 + 12345;
+ return (next >> 16) & 32767;
+}
+
+void dbg_failure_mode_registration(struct ubifs_info *c)
+{
+ struct failure_mode_info *fmi;
+
+ fmi = kmalloc(sizeof(struct failure_mode_info), GFP_NOFS);
+ if (!fmi) {
+ dbg_err("Failed to register failure mode - no memory");
+ return;
+ }
+ fmi->c = c;
+ spin_lock(&fmi_lock);
+ list_add_tail(&fmi->list, &fmi_list);
+ spin_unlock(&fmi_lock);
+}
+
+void dbg_failure_mode_deregistration(struct ubifs_info *c)
+{
+ struct failure_mode_info *fmi, *tmp;
+
+ spin_lock(&fmi_lock);
+ list_for_each_entry_safe(fmi, tmp, &fmi_list, list)
+ if (fmi->c == c) {
+ list_del(&fmi->list);
+ kfree(fmi);
+ }
+ spin_unlock(&fmi_lock);
+}
+
+static struct ubifs_info *dbg_find_info(struct ubi_volume_desc *desc)
+{
+ struct failure_mode_info *fmi;
+
+ spin_lock(&fmi_lock);
+ list_for_each_entry(fmi, &fmi_list, list)
+ if (fmi->c->ubi == desc) {
+ struct ubifs_info *c = fmi->c;
+
+ spin_unlock(&fmi_lock);
+ return c;
+ }
+ spin_unlock(&fmi_lock);
+ return NULL;
+}
+
+static int in_failure_mode(struct ubi_volume_desc *desc)
+{
+ struct ubifs_info *c = dbg_find_info(desc);
+
+ if (c)
+ return c->failure_mode;
+ return 0;
+}
+
+static int do_fail(struct ubi_volume_desc *desc, int lnum, int write)
+{
+ struct ubifs_info *c = dbg_find_info(desc);
+
+ if (!c)
+ return 0;
+ if (c->failure_mode)
+ return 1;
+ if (lnum == UBIFS_SB_LNUM)
+ return 0;
+ else if (lnum == UBIFS_MST_LNUM || lnum == UBIFS_MST_LNUM + 1) {
+ if (chance(19, 20))
+ return 0;
+ dbg_mnt("failing in master LEB %d", lnum);
+ } else if (lnum >= UBIFS_LOG_LNUM && lnum <= c->log_last) {
+ if (write && chance(99, 100))
+ return 0;
+ else if (chance(399, 400))
+ return 0;
+ dbg_mnt("failing in log LEB %d", lnum);
+ } else if (lnum >= c->lpt_first && lnum <= c->lpt_last) {
+ if (write && chance(99, 100))
+ return 0;
+ else if (chance(399, 400))
+ return 0;
+ dbg_mnt("failing in LPT LEB %d", lnum);
+ } else if (lnum >= c->orph_first && lnum <= c->orph_last) {
+ if (write && chance(9, 10))
+ return 0;
+ else if (chance(39, 40))
+ return 0;
+ dbg_mnt("failing in orphan LEB %d", lnum);
+ } else if (lnum == c->ihead_lnum) {
+ if (chance(99, 100))
+ return 0;
+ dbg_mnt("failing in index head LEB %d", lnum);
+ } else if (write && !RB_EMPTY_ROOT(&c->buds) &&
+ ubifs_search_bud(c, lnum) == NULL) {
+ if (chance(19, 20))
+ return 0;
+ dbg_mnt("failing in non-bud LEB %d", lnum);
+ } else if (c->cmt_state == COMMIT_RUNNING_BACKGROUND ||
+ c->cmt_state == COMMIT_RUNNING_REQUIRED) {
+ if (chance(999, 1000))
+ return 0;
+ dbg_mnt("failing in bud LEB %d commit running", lnum);
+ } else {
+ if (chance(9999, 10000))
+ return 0;
+ dbg_mnt("failing in bud LEB %d commit not running", lnum);
+ }
+ ubifs_err("*** SETTING FAILURE MODE ON ***");
+ c->failure_mode = 1;
+ dump_stack();
+ return 1;
+}
+
+static void cut_data(const void *buf, int len)
+{
+ int flen, i;
+ unsigned char *p = (void *)buf;
+
+ flen = (len * (long long)simple_rand()) >> 15;
+ for (i = flen; i < len; i++)
+ p[i] = 0xff;
+}
+
+int dbg_leb_read(struct ubi_volume_desc *desc, int lnum, char *buf, int offset,
+ int len, int check)
+{
+ if (in_failure_mode(desc))
+ return -EIO;
+ return ubi_leb_read(desc, lnum, buf, offset, len, check);
+}
+
+int dbg_leb_write(struct ubi_volume_desc *desc, int lnum, const void *buf,
+ int offset, int len, int dtype)
+{
+ int err;
+
+ if (in_failure_mode(desc))
+ return -EIO;
+ if (do_fail(desc, lnum, 1))
+ cut_data(buf, len);
+ err = ubi_leb_write(desc, lnum, buf, offset, len, dtype);
+ if (err)
+ return err;
+ if (in_failure_mode(desc))
+ return -EIO;
+ return 0;
+}
+
+int dbg_leb_change(struct ubi_volume_desc *desc, int lnum, const void *buf,
+ int len, int dtype)
+{
+ int err;
+
+ if (do_fail(desc, lnum, 1))
+ return -EIO;
+ err = ubi_leb_change(desc, lnum, buf, len, dtype);
+ if (err)
+ return err;
+ if (do_fail(desc, lnum, 1))
+ return -EIO;
+ return 0;
+}
+
+int dbg_leb_erase(struct ubi_volume_desc *desc, int lnum)
+{
+ int err;
+
+ if (do_fail(desc, lnum, 0))
+ return -EIO;
+ err = ubi_leb_erase(desc, lnum);
+ if (err)
+ return err;
+ if (do_fail(desc, lnum, 0))
+ return -EIO;
+ return 0;
+}
+
+int dbg_leb_unmap(struct ubi_volume_desc *desc, int lnum)
+{
+ int err;
+
+ if (do_fail(desc, lnum, 0))
+ return -EIO;
+ err = ubi_leb_unmap(desc, lnum);
+ if (err)
+ return err;
+ if (do_fail(desc, lnum, 0))
+ return -EIO;
+ return 0;
+}
+
+int dbg_is_mapped(struct ubi_volume_desc *desc, int lnum)
+{
+ if (in_failure_mode(desc))
+ return -EIO;
+ return ubi_is_mapped(desc, lnum);
+}
+
+#endif /* CONFIG_UBIFS_FS_DEBUG_TEST_RCVRY */
+#endif /* CONFIG_UBIFS_FS_DEBUG */
diff --git a/fs/ubifs/debug.h b/fs/ubifs/debug.h
new file mode 100644
index 0000000..7746ad6
--- /dev/null
+++ b/fs/ubifs/debug.h
@@ -0,0 +1,343 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+#ifndef __UBIFS_DEBUG_H__
+#define __UBIFS_DEBUG_H__
+
+#ifdef CONFIG_UBIFS_FS_DEBUG
+#define UBIFS_DBG(op) op
+#define ubifs_assert(expr) do { \
+ if (unlikely(!(expr))) { \
+ printk(KERN_CRIT "UBIFS assert failed in %s at %u (pid %d)\n", \
+ __func__, __LINE__, current->pid); \
+ dump_stack(); \
+ } \
+} while (0)
+
+/* Generic debugging message */
+#define dbg_msg(fmt, ...) do { \
+ printk(KERN_DEBUG "UBIFS DBG (pid %d): %s: " fmt "\n", current->pid, \
+ __func__, ##__VA_ARGS__); \
+} while (0)
+
+/* Debugging message which prints UBIFS key */
+#define dbg_key(c, key, fmt, ...) do { \
+ spin_lock(&dbg_lock); \
+ printk(KERN_DEBUG "UBIFS DBG (pid %d): %s: " fmt " %s\n", \
+ current->pid, __func__, ##__VA_ARGS__, \
+ dbg_get_key_dump(c, key)); \
+ spin_unlock(&dbg_lock); \
+} while (0)
+
+#define dbg_err(fmt, ...) ubifs_err(fmt, ##__VA_ARGS__)
+#define dbg_dump_stack() dump_stack()
+
+#define ubifs_assert_cmt_locked(c) do { \
+ if (unlikely(down_write_trylock(&(c)->commit_sem))) { \
+ up_write(&(c)->commit_sem); \
+ printk(KERN_CRIT "commit lock is not locked!\n"); \
+ ubifs_assert(0); \
+ } \
+} while (0)
+
+#ifndef UBIFS_DBG_PRESERVE_KMALLOC
+#define kmalloc dbg_kmalloc
+#define kzalloc dbg_kzalloc
+#define kfree dbg_kfree
+#define vmalloc dbg_vmalloc
+#define vfree dbg_vfree
+#endif
+
+#else
+
+#define UBIFS_DBG(op)
+#define ubifs_assert(expr) ({})
+#define dbg_msg(fmt, ...) ({})
+#define dbg_key(c, key, fmt, ...) ({})
+#define dbg_err(fmt, ...) ({})
+#define dbg_dump_stack()
+#define ubifs_assert_cmt_locked(c)
+
+#endif /* !CONFIG_UBIFS_FS_DEBUG */
+
+#ifdef CONFIG_UBIFS_FS_DEBUG
+
+extern spinlock_t dbg_lock;
+const char *dbg_ntype(int type);
+const char *dbg_cstate(int cmt_state);
+const char *dbg_get_key_dump(const struct ubifs_info *c,
+ const union ubifs_key *key);
+void dbg_dump_node(const struct ubifs_info *c, const void *node);
+void dbg_dump_budget_req(const struct ubifs_budget_req *req);
+void dbg_dump_lstats(const struct ubifs_lp_stats *lst);
+void dbg_dump_budg(struct ubifs_info *c);
+void dbg_dump_lprop(const struct ubifs_info *c, const struct ubifs_lprops *lp);
+void dbg_dump_lprops(struct ubifs_info *c);
+void dbg_dump_leb(const struct ubifs_info *c, int lnum);
+void dbg_dump_znode(const struct ubifs_info *c,
+ const struct ubifs_znode *znode);
+void dbg_dump_heap(struct ubifs_info *c, struct ubifs_lpt_heap *heap, int cat);
+void dbg_dump_pnode(struct ubifs_info *c, struct ubifs_pnode *pnode,
+ struct ubifs_nnode *parent, int iip);
+
+void *dbg_kmalloc(size_t size, gfp_t flags);
+void *dbg_kzalloc(size_t size, gfp_t flags);
+void dbg_kfree(const void *addr);
+void *dbg_vmalloc(size_t size);
+void dbg_vfree(void *addr);
+void dbg_leak_report(void);
+
+typedef int (*dbg_leaf_callback)(struct ubifs_info *c,
+ struct ubifs_zbranch *zbr, void *priv);
+typedef int (*dbg_znode_callback)(struct ubifs_info *c,
+ struct ubifs_znode *znode, void *priv);
+
+int dbg_walk_index(struct ubifs_info *c, dbg_leaf_callback leaf_cb,
+ dbg_znode_callback znode_cb, void *priv);
+int dbg_read_leaf_nolock(struct ubifs_info *c, struct ubifs_zbranch *zbr,
+ void *node);
+#else
+
+#define dbg_ntype(type) ""
+#define dbg_cstate(cmt_state) ""
+#define dbg_get_key_dump(c, key) ({})
+#define dbg_dump_node(c, node) ({})
+#define dbg_dump_budget_req(req) ({})
+#define dbg_dump_lstats(lst) ({})
+#define dbg_dump_budg(c) ({})
+#define dbg_dump_lprop(c, lp) ({})
+#define dbg_dump_lprops(c) ({})
+#define dbg_dump_leb(c, lnum) ({})
+#define dbg_dump_znode(c, znode) ({})
+#define dbg_dump_heap(c, heap, cat) ({})
+#define dbg_dump_pnode(c, pnode, parent, iip) ({})
+
+#define dbg_leak_report() ({})
+#define dbg_walk_index(c, leaf_cb, znode_cb, priv) 0
+#define dbg_read_leaf_nolock(c, zbr, node) 0
+
+#endif /* !CONFIG_UBIFS_FS_DEBUG */
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_MEMPRESS
+void dbg_eat_memory(void);
+void __init dbg_mempressure_init(void);
+void dbg_mempressure_exit(void);
+#else
+#define dbg_eat_memory() ({})
+#define dbg_mempressure_init() ({})
+#define dbg_mempressure_exit() ({})
+#endif
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_LPROPS
+int dbg_check_lprops(struct ubifs_info *c);
+#else
+#define dbg_check_lprops(c) 0
+#endif
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_OLD_IDX
+int dbg_old_index_check_init(struct ubifs_info *c, struct ubifs_zbranch *zroot);
+int dbg_check_old_index(struct ubifs_info *c, struct ubifs_zbranch *zroot);
+#else
+#define dbg_old_index_check_init(c, zroot) 0
+#define dbg_check_old_index(c, zroot) 0
+#endif
+
+#if defined(CONFIG_UBIFS_FS_DEBUG_CHK_LPROPS) || \
+ defined(CONFIG_UBIFS_FS_DEBUG_CHK_OTHER)
+int dbg_check_cats(struct ubifs_info *c);
+#else
+#define dbg_check_cats(c) 0
+#endif
+
+/* General messages */
+#ifdef CONFIG_UBIFS_FS_DEBUG_MSG_GEN
+#define dbg_gen(fmt, ...) dbg_msg(fmt, ##__VA_ARGS__)
+#define dbg_gen_key(c, key, fmt, ...) dbg_key(c, key, fmt, ##__VA_ARGS__)
+#else
+#define dbg_gen(fmt, ...) ({})
+#define dbg_gen_key(c, key, fmt, ...) ({})
+#endif
+
+/* Additional journal messages */
+#ifdef CONFIG_UBIFS_FS_DEBUG_MSG_JRN
+#define dbg_jrn(fmt, ...) dbg_msg(fmt, ##__VA_ARGS__)
+#define dbg_jrn_key(c, key, fmt, ...) dbg_key(c, key, fmt, ##__VA_ARGS__)
+#else
+#define dbg_jrn(fmt, ...) ({})
+#define dbg_jrn_key(c, key, fmt, ...) ({})
+#endif
+
+/* Additional TNC messages */
+#ifdef CONFIG_UBIFS_FS_DEBUG_MSG_TNC
+#define dbg_tnc(fmt, ...) dbg_msg(fmt, ##__VA_ARGS__)
+#define dbg_tnc_key(c, key, fmt, ...) dbg_key(c, key, fmt, ##__VA_ARGS__)
+#else
+#define dbg_tnc(fmt, ...) ({})
+#define dbg_tnc_key(c, key, fmt, ...) ({})
+#endif
+
+/* Additional lprops messages */
+#ifdef CONFIG_UBIFS_FS_DEBUG_MSG_LP
+#define dbg_lp(fmt, ...) dbg_msg(fmt, ##__VA_ARGS__)
+#else
+#define dbg_lp(fmt, ...) ({})
+#endif
+
+/* Additional LEB find messages */
+#ifdef CONFIG_UBIFS_FS_DEBUG_MSG_FIND
+#define dbg_find(fmt, ...) dbg_msg(fmt, ##__VA_ARGS__)
+#else
+#define dbg_find(fmt, ...) ({})
+#endif
+
+/* Additional mount messages */
+#ifdef CONFIG_UBIFS_FS_DEBUG_MSG_MNT
+#define dbg_mnt(fmt, ...) dbg_msg(fmt, ##__VA_ARGS__)
+#define dbg_mnt_key(c, key, fmt, ...) dbg_key(c, key, fmt, ##__VA_ARGS__)
+#else
+#define dbg_mnt(fmt, ...) ({})
+#define dbg_mnt_key(c, key, fmt, ...) ({})
+#endif
+
+/* Additional I/O messages */
+#ifdef CONFIG_UBIFS_FS_DEBUG_MSG_IO
+#define dbg_io(fmt, ...) dbg_msg(fmt, ##__VA_ARGS__)
+#else
+#define dbg_io(fmt, ...) ({})
+#endif
+
+/* Additional commit messages */
+#ifdef CONFIG_UBIFS_FS_DEBUG_MSG_CMT
+#define dbg_cmt(fmt, ...) dbg_msg(fmt, ##__VA_ARGS__)
+#else
+#define dbg_cmt(fmt, ...) ({})
+#endif
+
+/* Additional budgeting messages */
+#ifdef CONFIG_UBIFS_FS_DEBUG_MSG_BUDG
+#define dbg_budg(fmt, ...) dbg_msg(fmt, ##__VA_ARGS__)
+#else
+#define dbg_budg(fmt, ...) ({})
+#endif
+
+/* Additional log messages */
+#ifdef CONFIG_UBIFS_FS_DEBUG_MSG_LOG
+#define dbg_log(fmt, ...) dbg_msg(fmt, ##__VA_ARGS__)
+#else
+#define dbg_log(fmt, ...) ({})
+#endif
+
+/* Additional gc messages */
+#ifdef CONFIG_UBIFS_FS_DEBUG_MSG_GC
+#define dbg_gc(fmt, ...) dbg_msg(fmt, ##__VA_ARGS__)
+#define dbg_gc_key(c, key, fmt, ...) dbg_key(c, key, fmt, ##__VA_ARGS__)
+#else
+#define dbg_gc(fmt, ...) ({})
+#define dbg_gc_key(c, key, fmt, ...) ({})
+#endif
+
+/* Additional scan messages */
+#ifdef CONFIG_UBIFS_FS_DEBUG_MSG_SCAN
+#define dbg_scan(fmt, ...) dbg_msg(fmt, ##__VA_ARGS__)
+#else
+#define dbg_scan(fmt, ...) ({})
+#endif
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_OTHER
+int dbg_check_dir_size(struct ubifs_info *c, const struct inode *dir);
+#else
+#define dbg_check_dir_size(c, dir) 0
+#endif
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_TNC
+int dbg_check_tnc(struct ubifs_info *c, int extra);
+#else
+#define dbg_check_tnc(c, x) 0
+#endif
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_IDX_SZ
+int dbg_check_idx_size(struct ubifs_info *c, long long idx_size);
+#else
+#define dbg_check_idx_size(c, idx_size) 0
+#endif
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_LPROPS
+int dbg_check_lprops(struct ubifs_info *c);
+int dbg_check_lpt_nodes(struct ubifs_info *c, struct ubifs_cnode *cnode,
+ int row, int col);
+#else
+#define dbg_check_lprops(c) 0
+#define dbg_check_lpt_nodes(c, cnode, row, col) 0
+#endif
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_TEST_RCVRY
+
+void dbg_failure_mode_registration(struct ubifs_info *c);
+void dbg_failure_mode_deregistration(struct ubifs_info *c);
+
+#undef dbg_dump_stack
+#define dbg_dump_stack()
+#define dbg_failure_mode 1
+
+#ifndef UBIFS_DBG_PRESERVE_UBI
+#define ubi_leb_read dbg_leb_read
+#define ubi_leb_write dbg_leb_write
+#define ubi_leb_change dbg_leb_change
+#define ubi_leb_erase dbg_leb_erase
+#define ubi_leb_unmap dbg_leb_unmap
+#define ubi_is_mapped dbg_is_mapped
+
+int dbg_leb_read(struct ubi_volume_desc *desc, int lnum, char *buf, int offset,
+ int len, int check);
+int dbg_leb_write(struct ubi_volume_desc *desc, int lnum, const void *buf,
+ int offset, int len, int dtype);
+int dbg_leb_change(struct ubi_volume_desc *desc, int lnum, const void *buf,
+ int len, int dtype);
+int dbg_leb_erase(struct ubi_volume_desc *desc, int lnum);
+int dbg_leb_unmap(struct ubi_volume_desc *desc, int lnum);
+int dbg_is_mapped(struct ubi_volume_desc *desc, int lnum);
+static inline int dbg_read(struct ubi_volume_desc *desc, int lnum, char *buf,
+ int offset, int len)
+{
+ return dbg_leb_read(desc, lnum, buf, offset, len, 0);
+}
+static inline int dbg_write(struct ubi_volume_desc *desc, int lnum,
+ const void *buf, int offset, int len)
+{
+ return dbg_leb_write(desc, lnum, buf, offset, len, UBI_UNKNOWN);
+}
+static inline int dbg_change(struct ubi_volume_desc *desc, int lnum,
+ const void *buf, int len)
+{
+ return dbg_leb_change(desc, lnum, buf, len, UBI_UNKNOWN);
+}
+#endif /* !UBIFS_DBG_PRESERVE_UBI */
+
+#else
+
+#define dbg_failure_mode_registration(c) ({})
+#define dbg_failure_mode_deregistration(c) ({})
+#define dbg_failure_mode 0
+
+#endif /* !CONFIG_UBIFS_FS_DEBUG_TEST_RCVRY */
+
+#endif /* !__UBIFS_DEBUG_H__ */
--
1.5.4.1

2008-03-27 13:11:44

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 05/26] UBIFS: add file-system build

The file-system build code contains most of the UBIFS initialization
and mount-related functionality implementation.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/build.c | 1351 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ubifs/super.c | 531 +++++++++++++++++++++
2 files changed, 1882 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/build.c b/fs/ubifs/build.c
new file mode 100644
index 0000000..1142020
--- /dev/null
+++ b/fs/ubifs/build.c
@@ -0,0 +1,1351 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file implements UBIFS initialization, mount and un-mount. Some
+ * initialization stuff which is rather large and complex is placed at
+ * corresponding subsystems, but most of it is here.
+ */
+
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/ctype.h>
+#include <linux/random.h>
+#include <linux/kthread.h>
+#include <linux/parser.h>
+#include "ubifs.h"
+
+/* Slab cache for UBIFS inodes */
+struct kmem_cache *ubifs_inode_slab;
+
+/* UBIFS TNC shrinker description */
+static struct shrinker ubifs_shrinker_info = {
+ .shrink = ubifs_shrinker,
+ .seeks = DEFAULT_SEEKS,
+};
+
+/**
+ * init_constants_early - initialize UBIFS constants.
+ * @c: UBIFS file-system description object
+ *
+ * This function initialize UBIFS constants which do not need the superblock to
+ * be read. It also checks that the UBI volume satisfies basic UBIFS
+ * requirements. Returns zero in case of success and a negative error code in
+ * case of failure.
+ */
+static int init_constants_early(struct ubifs_info *c)
+{
+ if (c->vi.corrupted) {
+ ubifs_warn("UBI volume is corrupted - read-only mode");
+ c->ro_media = 1;
+ }
+
+ if (c->di.ro_mode) {
+ ubifs_msg("read-only UBI device");
+ c->ro_media = 1;
+ }
+
+ if (c->vi.vol_type == UBI_STATIC_VOLUME) {
+ ubifs_msg("static UBI volume - read-only mode");
+ c->ro_media = 1;
+ }
+
+ c->leb_cnt = c->vi.size;
+ c->leb_size = c->vi.usable_leb_size;
+ c->half_leb_size = c->leb_size / 2;
+ c->min_io_size = c->di.min_io_size;
+ c->min_io_shift = fls(c->min_io_size) - 1;
+
+ if (c->leb_size < UBIFS_MIN_LEB_SZ) {
+ ubifs_err("too small LEBs (%d bytes), min. is %d bytes",
+ c->leb_size, UBIFS_MIN_LEB_SZ);
+ return -EINVAL;
+ }
+
+ if (c->leb_cnt < UBIFS_MIN_LEB_CNT) {
+ ubifs_err("too few LEBs (%d), min. is %d",
+ c->leb_cnt, UBIFS_MIN_LEB_CNT);
+ return -EINVAL;
+ }
+
+ if (!is_power_of_2(c->min_io_size)) {
+ ubifs_err("bad min. I/O size %d", c->min_io_size);
+ return -EINVAL;
+ }
+
+ /*
+ * UBIFS aligns all node to 8-byte boundary, so to make function in
+ * io.c simpler, assume minimum I/O unit size to be 8 bytes if it is
+ * less then 8.
+ */
+ if (c->min_io_size < 8) {
+ c->min_io_size = 8;
+ c->min_io_shift = 3;
+ }
+
+ c->ref_node_alsz = ALIGN(UBIFS_REF_NODE_SZ, c->min_io_size);
+ c->mst_node_alsz = ALIGN(UBIFS_MST_NODE_SZ, c->min_io_size);
+
+ /*
+ * Initialize node length ranges which are mostly needed for node
+ * length validation.
+ */
+ c->ranges[UBIFS_PAD_NODE].len = UBIFS_PAD_NODE_SZ;
+ c->ranges[UBIFS_SB_NODE].len = UBIFS_SB_NODE_SZ;
+ c->ranges[UBIFS_MST_NODE].len = UBIFS_MST_NODE_SZ;
+ c->ranges[UBIFS_REF_NODE].len = UBIFS_REF_NODE_SZ;
+ c->ranges[UBIFS_TRUN_NODE].len = UBIFS_TRUN_NODE_SZ;
+ c->ranges[UBIFS_CS_NODE].len = UBIFS_CS_NODE_SZ;
+
+ c->ranges[UBIFS_INO_NODE].min_len = UBIFS_INO_NODE_SZ;
+ c->ranges[UBIFS_INO_NODE].max_len = UBIFS_MAX_INO_NODE_SZ;
+ c->ranges[UBIFS_ORPH_NODE].min_len =
+ UBIFS_ORPH_NODE_SZ + sizeof(__le64);
+ c->ranges[UBIFS_ORPH_NODE].max_len = c->leb_size;
+ c->ranges[UBIFS_DENT_NODE].min_len = UBIFS_DENT_NODE_SZ;
+ c->ranges[UBIFS_DENT_NODE].max_len = UBIFS_MAX_DENT_NODE_SZ;
+ c->ranges[UBIFS_XENT_NODE].min_len = UBIFS_XENT_NODE_SZ;
+ c->ranges[UBIFS_XENT_NODE].max_len = UBIFS_MAX_XENT_NODE_SZ;
+ c->ranges[UBIFS_DATA_NODE].min_len = UBIFS_DATA_NODE_SZ;
+ c->ranges[UBIFS_DATA_NODE].max_len = UBIFS_MAX_DATA_NODE_SZ;
+ /*
+ * Minimum indexing node size is amended later when superblock is
+ * read and the key length is known.
+ */
+ c->ranges[UBIFS_IDX_NODE].min_len = UBIFS_IDX_NODE_SZ + UBIFS_BRANCH_SZ;
+ /*
+ * Maximum indexing node size is amended later when superblock is
+ * read and the fanout is known.
+ */
+ c->ranges[UBIFS_IDX_NODE].max_len = INT_MAX;
+
+ /*
+ * Initialize dead and dark LEB space watermarks.
+ *
+ * Dead space is the space which cannot be used. Its watermark is
+ * equivalent to min. I/O unit or minimum node size if it is greater
+ * then min. I/O unit.
+ *
+ * Dark space is the space which might be used, or might not, depending
+ * on which node should be written to the LEB. Its watermark is
+ * equivalent to maximum UBIFS node size.
+ */
+ c->dead_wm = ALIGN(MIN_WRITE_SZ, c->min_io_size);
+ c->dark_wm = ALIGN(UBIFS_MAX_NODE_SZ, c->min_io_size);
+
+ return 0;
+}
+
+/**
+ * bud_wbuf_callback - bud LEB write-buffer synchronization call-back.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB the write-buffer was synchronized to
+ * @free: how many free bytes left in this LEB
+ * @pad: how many bytes were padded
+ *
+ * This is a callback function which is called by the I/O unit when the
+ * write-buffer is synchronized. We need this to correctly maintain space
+ * accounting in bud logical eraseblocks. This function returns zero in case of
+ * success and a negative error code in case of failure.
+ *
+ * This function actually belongs to the journal, but we keep it here because
+ * we want to keep it static.
+ */
+static int bud_wbuf_callback(struct ubifs_info *c, int lnum, int free, int pad)
+{
+ return ubifs_update_one_lp(c, lnum, free, pad, 0, 0);
+}
+
+/*
+ * init_constants_late - initialize UBIFS constants.
+ * @c: UBIFS file-system description object
+ *
+ * This is a helper function which initializes various UBIFS constants after
+ * the superblock has been read. It also checks various UBIFS parameters and
+ * makes sure they are all right. Returns zero in case of success and a
+ * negative error code in case of failure.
+ */
+static int init_constants_late(struct ubifs_info *c)
+{
+ int tmp, err;
+ long long tmp64;
+
+ c->main_bytes = c->main_lebs * c->leb_size;
+
+ c->max_znode_sz = sizeof(struct ubifs_znode) +
+ c->fanout * sizeof(struct ubifs_zbranch);
+
+ tmp = ubifs_idx_node_sz(c, 1);
+ c->ranges[UBIFS_IDX_NODE].min_len = tmp;
+ c->min_idx_node_sz = ALIGN(tmp, 8);
+
+ tmp = ubifs_idx_node_sz(c, c->fanout);
+ c->ranges[UBIFS_IDX_NODE].max_len = tmp;
+ c->max_idx_node_sz = ALIGN(tmp, 8);
+
+ /* Make sure LEB size is large enough to fit full commit */
+ tmp = UBIFS_CS_NODE_SZ + UBIFS_REF_NODE_SZ * c->jhead_cnt;
+ tmp = ALIGN(tmp, c->min_io_size);
+ if (tmp > c->leb_size) {
+ dbg_err("too small LEB size %d, at least %d needed",
+ c->leb_size, tmp);
+ return -EINVAL;
+ }
+
+ /*
+ * Make sure that the log is large enough to fit reference nodes for
+ * all buds plus one reserved LEB.
+ */
+ tmp64 = c->max_bud_bytes;
+ tmp = do_div(tmp64, c->leb_size);
+ c->max_bud_cnt = tmp64 + !!tmp;
+ tmp = (c->ref_node_alsz * c->max_bud_cnt + c->leb_size - 1);
+ tmp /= c->leb_size;
+ tmp += 1;
+ if (c->log_lebs < tmp) {
+ dbg_err("too small log %d LEBs, required min. %d LEBs",
+ c->log_lebs, tmp);
+ return -EINVAL;
+ }
+
+ /*
+ * When budgeting we assume worst-case scenarios when the pages are not
+ * be compressed and direntries are of the maximum size.
+ *
+ * Note, data, which may be stored in inodes is budgeted separately, so
+ * it is not included into 'c->inode_budget'.
+ *
+ * c->page_budget is PAGE_CACHE_SIZE + UBIFS_CH_SZ * blocks_per_page
+ */
+ c->page_budget = PAGE_CACHE_SIZE + UBIFS_CH_SZ;
+ c->inode_budget = UBIFS_INO_NODE_SZ;
+ c->dent_budget = UBIFS_MAX_DENT_NODE_SZ;
+
+ /*
+ * When the amount of flash space used by buds becomes
+ * 'c->max_bud_bytes', UBIFS just blocks all writers and starts commit.
+ * The writers are unblocked when the commit is finished. To avoid
+ * writers to be blocked UBIFS initiates background commit in advance,
+ * when number of bud bytes becomes above the limit defined below.
+ */
+ c->bg_bud_bytes = (c->max_bud_bytes * 13) >> 4;
+
+ err = ubifs_calc_lpt_geom(c);
+ if (err)
+ return err;
+
+ c->min_idx_lebs = ubifs_calc_min_idx_lebs(c);
+
+ /*
+ * Calculate total amount of FS blocks. This number is not used
+ * internally because it does not make much sense for UBIFS, but it is
+ * necessary to report something for the 'statfs()' call.
+ */
+ c->block_cnt = (long long)c->main_lebs * (c->leb_size - c->dark_wm);
+ c->block_cnt >>= UBIFS_BLOCK_SHIFT;
+
+ return 0;
+}
+
+/**
+ * care_about_gc_lnum - take care about reserved GC LEB.
+ * @c: UBIFS file-system description object
+ *
+ * This function ensures that the LEB reserved for garbage collection is
+ * unmapped and is marked as "taken" in lprops. We also have to set free space
+ * to LEB size and dirty space to zero, because lprops may contain out-of-date
+ * information if the file-system was un-mounted before it has been committed.
+ * This function returns zero in case of success and a negative error code in
+ * case of failure.
+ */
+static int care_about_gc_lnum(struct ubifs_info *c)
+{
+ int err;
+
+ if (c->gc_lnum == -1) {
+ ubifs_err("no LEB for GC");
+ return -EINVAL;
+ }
+
+ err = ubifs_leb_unmap(c, c->gc_lnum);
+ if (err)
+ return err;
+
+ /* And we have to tell lprops that this LEB is taken */
+ err = ubifs_change_one_lp(c, c->gc_lnum, c->leb_size, 0,
+ LPROPS_TAKEN, 0, 0);
+ return err;
+}
+
+/**
+ * alloc_wbufs - allocate write-buffers.
+ * @c: UBIFS file-system description object
+ *
+ * This helper function allocates and initializes UBIFS write-buffers. Returns
+ * zero in case of success and %-ENOMEM in case of failure.
+ */
+static int alloc_wbufs(struct ubifs_info *c)
+{
+ int i, err;
+
+ c->jheads = kzalloc(c->jhead_cnt * sizeof(struct ubifs_jhead),
+ GFP_KERNEL);
+ if (!c->jheads)
+ return -ENOMEM;
+
+ /* Initialize journal heads */
+ for (i = 0; i < c->jhead_cnt; i++) {
+ INIT_LIST_HEAD(&c->jheads[i].buds_list);
+ err = ubifs_wbuf_init(c, &c->jheads[i].wbuf);
+ if (err)
+ return err;
+
+ c->jheads[i].wbuf.sync_callback = &bud_wbuf_callback;
+ c->jheads[i].wbuf.jhead = i;
+ }
+
+ c->jheads[BASEHD].wbuf.dtype = UBI_SHORTTERM;
+ /*
+ * Garbage Collector head likely contains long-term data and
+ * does not need to be synchronized by timer.
+ */
+ c->jheads[GCHD].wbuf.dtype = UBI_LONGTERM;
+ c->jheads[GCHD].wbuf.timeout = 0;
+
+ sprintf(c->bgt_name, "%s%d_%d", SYNCER_BG_NAME,
+ c->vi.ubi_num, c->vi.vol_id);
+
+ return 0;
+}
+
+/**
+ * free_wbufs - free write-buffers.
+ * @c: UBIFS file-system description object
+ */
+static void free_wbufs(struct ubifs_info *c)
+{
+ int i;
+
+ if (c->jheads) {
+ for (i = 0; i < c->jhead_cnt; i++) {
+ kfree(c->jheads[i].wbuf.buf);
+ kfree(c->jheads[i].wbuf.inodes);
+ }
+ kfree(c->jheads);
+ c->jheads = NULL;
+ }
+}
+
+/**
+ * free_orphans - free orphans.
+ * @c: UBIFS file-system description object
+ */
+static void free_orphans(struct ubifs_info *c)
+{
+ struct ubifs_orphan *orph;
+
+ while (c->orph_dnext) {
+ orph = c->orph_dnext;
+ c->orph_dnext = orph->dnext;
+ list_del(&orph->list);
+ kfree(orph);
+ }
+
+ while (!list_empty(&c->orph_list)) {
+ orph = list_entry(c->orph_list.next, struct ubifs_orphan, list);
+ list_del(&orph->list);
+ kfree(orph);
+ dbg_err("orphan list not empty at unmount");
+ }
+
+ vfree(c->orph_buf);
+ c->orph_buf = NULL;
+}
+
+/**
+ * free_buds - free per-bud objects.
+ * @c: UBIFS file-system description object
+ */
+static void free_buds(struct ubifs_info *c)
+{
+ struct rb_node *this = c->buds.rb_node;
+ struct ubifs_bud *bud;
+
+ while (this) {
+ if (this->rb_left)
+ this = this->rb_left;
+ else if (this->rb_right)
+ this = this->rb_right;
+ else {
+ bud = rb_entry(this, struct ubifs_bud, rb);
+ this = rb_parent(this);
+ if (this) {
+ if (this->rb_left == &bud->rb)
+ this->rb_left = NULL;
+ else
+ this->rb_right = NULL;
+ }
+ kfree(bud);
+ }
+ }
+}
+
+/**
+ * check_volume_empty - check if the UBI volume is empty.
+ * @c: UBIFS file-system description object
+ *
+ * This function checks if the UBIFS volume is empty by looking if its LEBs are
+ * mapped or not. The result of checking is stored in the @c->empty variable.
+ * Returns zero in case of success and a negative error code in case of
+ * failure.
+ */
+static int check_volume_empty(struct ubifs_info *c)
+{
+ int lnum, err;
+
+ c->empty = 1;
+ for (lnum = 0; lnum < c->leb_cnt; lnum++) {
+ err = ubi_is_mapped(c->ubi, lnum);
+ if (unlikely(err < 0))
+ return err;
+ if (err == 1) {
+ c->empty = 0;
+ break;
+ }
+
+ cond_resched();
+ }
+
+ return 0;
+}
+
+/**
+ * mount_ubifs - mount UBIFS file-system.
+ * @c: UBIFS file-system description object
+ *
+ * This function mounts UBIFS file system. Returns zero in case of success and
+ * a negative error code in case of failure.
+ *
+ * Note, the function does not de-allocate resources it it fails half way
+ * through, and the caller has to do this instead.
+ */
+static int mount_ubifs(struct ubifs_info *c)
+{
+ struct super_block *sb = c->vfs_sb;
+ int err, mounted_read_only = (sb->s_flags & MS_RDONLY);
+ unsigned long long x;
+ size_t sz;
+
+ err = init_constants_early(c);
+ if (err)
+ return err;
+
+#ifdef CONFIG_UBIFS_FS_DEBUG
+ c->dbg_buf = vmalloc(c->leb_size);
+ if (!c->dbg_buf)
+ return -ENOMEM;
+#endif
+
+ err = check_volume_empty(c);
+ if (err)
+ return err;
+
+ if (c->empty && (mounted_read_only || c->ro_media)) {
+ /*
+ * This UBI volume is empty, and read-only, or the file system
+ * is mounted read-only - we cannot format it.
+ */
+ ubifs_err("can't format empty UBI volume: read-only %s",
+ c->ro_media ? "UBI volume" : "mount");
+ return -EROFS;
+ }
+
+ if (c->ro_media && !mounted_read_only) {
+ ubifs_err("cannot mount read-write - read-only media");
+ return -EROFS;
+ }
+
+ c->sbuf = vmalloc(c->leb_size);
+ if (!c->sbuf)
+ return -ENOMEM;
+
+ if (!mounted_read_only) {
+ c->ileb_buf = vmalloc(c->leb_size);
+ if (!c->ileb_buf)
+ return -ENOMEM;
+ }
+
+ err = ubifs_read_superblock(c);
+ if (err)
+ return err;
+
+ /*
+ * Make sure the compressor which is set as the default on in the
+ * superblock was actually compiled in.
+ */
+ if (!ubifs_compr_present(c->default_compr)) {
+ ubifs_warn("'%s' compressor is set by superblock, but not "
+ "compiled in", ubifs_compr_name(c->default_compr));
+ c->default_compr = UBIFS_COMPR_NONE;
+ }
+
+ dbg_failure_mode_registration(c);
+
+ err = init_constants_late(c);
+ if (err)
+ return err;
+
+ sz = ALIGN(c->max_idx_node_sz, c->min_io_size);
+ sz = ALIGN(sz + c->max_idx_node_sz, c->min_io_size);
+ c->cbuf = kmalloc(sz, GFP_NOFS);
+ if (!c->cbuf)
+ return -ENOMEM;
+
+ if (!mounted_read_only) {
+ err = alloc_wbufs(c);
+ if (err)
+ return err;
+
+ /* Create background thread */
+ c->bgt = kthread_create(ubifs_bg_thread, c, c->bgt_name);
+ if (!c->bgt)
+ c->bgt = ERR_PTR(-EINVAL);
+ if (IS_ERR(c->bgt)) {
+ err = PTR_ERR(c->bgt);
+ c->bgt = NULL;
+ ubifs_err("cannot spawn \"%s\", error %d",
+ c->bgt_name, err);
+ return err;
+ }
+ }
+
+ err = ubifs_read_master(c);
+ if (err)
+ return err;
+
+ if ((c->mst_node->flags & cpu_to_le32(UBIFS_MST_DIRTY)) != 0) {
+ ubifs_msg("recovery needed");
+ c->need_recovery = 1;
+ if (!mounted_read_only) {
+ err = ubifs_recover_inl_heads(c, c->sbuf);
+ if (err)
+ return err;
+ }
+ } else if (!mounted_read_only) {
+ /*
+ * Set the "dirty" flag so that if we reboot uncleanly we
+ * will notice this immediately on the next mount.
+ */
+ c->mst_node->flags |= cpu_to_le32(UBIFS_MST_DIRTY);
+ err = ubifs_write_master(c);
+ if (err)
+ return err;
+ }
+
+ err = ubifs_lpt_init(c, 1, !mounted_read_only);
+ if (err)
+ return err;
+
+ err = dbg_check_idx_size(c, c->old_idx_sz);
+ if (err)
+ return err;
+
+ err = ubifs_replay_journal(c);
+ if (err)
+ return err;
+
+ if (!mounted_read_only) {
+ int lnum;
+
+ if (c->need_recovery)
+ err = ubifs_recover_gc_lnum(c);
+ else
+ err = care_about_gc_lnum(c);
+ if (err)
+ return err;
+ err = ubifs_mount_orphans(c, c->need_recovery);
+ if (err)
+ return err;
+
+ /* Check for enough log space */
+ lnum = c->lhead_lnum + 1;
+ if (lnum >= UBIFS_LOG_LNUM + c->log_lebs)
+ lnum = UBIFS_LOG_LNUM;
+ if (lnum == c->ltail_lnum) {
+ err = ubifs_consolidate_log(c);
+ if (err)
+ return err;
+ }
+
+ /* Check for enough free space */
+ if (ubifs_calc_available(c) <= 0) {
+ ubifs_err("insufficient available space");
+ return -EINVAL;
+ }
+
+ err = dbg_check_lprops(c);
+ if (err)
+ return err;
+ }
+
+ if (c->need_recovery) {
+ err = ubifs_recover_size(c);
+ if (err)
+ return err;
+ }
+
+ spin_lock(&ubifs_infos_lock);
+ list_add_tail(&c->infos_list, &ubifs_infos);
+ spin_unlock(&ubifs_infos_lock);
+
+ if (c->need_recovery) {
+ if (mounted_read_only)
+ ubifs_msg("recovery deferred");
+ else {
+ c->need_recovery = 0;
+ ubifs_msg("recovery completed");
+ }
+ }
+
+ ubifs_msg("mounted UBI device %d, volume %d", c->vi.ubi_num,
+ c->vi.vol_id);
+ if (mounted_read_only)
+ ubifs_msg("mounted read-only");
+ ubifs_msg("minimal I/O unit size: %d bytes", c->min_io_size);
+ ubifs_msg("logical eraseblock size: %d bytes (%d KiB)",
+ c->leb_size, c->leb_size / 1024);
+ x = (unsigned long long)c->main_lebs * c->leb_size;
+ ubifs_msg("file system size: %lld bytes (%lld KiB, %lld MiB, "
+ "%d LEBs)", x, x >> 10, x >> 20, c->main_lebs);
+ x = (unsigned long long)c->log_lebs * c->leb_size + c->max_bud_bytes;
+ ubifs_msg("journal size: %lld bytes (%lld KiB, %lld MiB, "
+ "%d LEBs)", x, x >> 10, x >> 20,
+ c->log_lebs + c->max_bud_cnt);
+ ubifs_msg("data journal heads: %d",
+ c->jhead_cnt - NONDATA_JHEADS_CNT);
+ ubifs_msg("default compressor: %s",
+ ubifs_compr_name(c->default_compr));
+
+ dbg_msg("compiled on: " __DATE__ " at " __TIME__);
+ dbg_msg("fast unmount: %d", c->fast_unmount);
+ dbg_msg("big_lpt %d", c->big_lpt);
+ dbg_msg("log LEBs: %d (%d - %d)",
+ c->log_lebs, UBIFS_LOG_LNUM, c->log_last);
+ dbg_msg("LPT area LEBs: %d (%d - %d)",
+ c->lpt_lebs, c->lpt_first, c->lpt_last);
+ dbg_msg("orphan area LEBs: %d (%d - %d)",
+ c->orph_lebs, c->orph_first, c->orph_last);
+ dbg_msg("main area LEBs: %d (%d - %d)",
+ c->main_lebs, c->main_first, c->leb_cnt - 1);
+ dbg_msg("index LEBs: %d", c->lst.idx_lebs);
+ dbg_msg("total index bytes: %lld (%lld KiB, %lld MiB)",
+ c->old_idx_sz, c->old_idx_sz >> 10, c->old_idx_sz >> 20);
+ dbg_msg("key hash type: %d", c->key_hash_type);
+ dbg_msg("tree fanout: %d", c->fanout);
+ dbg_msg("reserved GC LEB: %d", c->gc_lnum);
+ dbg_msg("first main LEB: %d", c->main_first);
+ dbg_msg("dead watermark: %d", c->dead_wm);
+ dbg_msg("dark watermark: %d", c->dark_wm);
+ x = c->main_lebs * c->dark_wm;
+ dbg_msg("max. dark space: %lld (%lld KiB, %lld MiB)",
+ x, x >> 10, x >> 20);
+ dbg_msg("maximum bud bytes: %lld (%lld KiB, %lld MiB)",
+ c->max_bud_bytes, c->max_bud_bytes >> 10,
+ c->max_bud_bytes >> 20);
+ dbg_msg("BG commit bud bytes: %lld (%lld KiB, %lld MiB)",
+ c->bg_bud_bytes, c->bg_bud_bytes >> 10,
+ c->bg_bud_bytes >> 20);
+ dbg_msg("current bud bytes %lld (%lld KiB, %lld MiB)",
+ c->bud_bytes, c->bud_bytes >> 10, c->bud_bytes >> 20);
+ dbg_msg("max. seq. number: %llu", c->max_sqnum);
+ dbg_msg("commit number: %llu", c->cmt_no);
+
+ return 0;
+}
+
+/**
+ * ubifs_umount - un-mount UBIFS file-system.
+ * @c: UBIFS file-system description object
+ *
+ * Note, this function is called to free allocated resourced when un-mounting,
+ * as well as free resources when an error occurred while we were half way
+ * through mounting (error path cleanup function). So it has to make sure the
+ * resource was actually allocated before freeing it.
+ */
+void ubifs_umount(struct ubifs_info *c)
+{
+ dbg_gen("un-mounting UBI device %d, volume %d", c->vi.ubi_num,
+ c->vi.vol_id);
+
+ ubifs_destroy_size_tree(c);
+
+ if (c->bgt)
+ kthread_stop(c->bgt);
+
+ free_buds(c);
+ ubifs_destroy_idx_gc(c);
+ ubifs_tnc_close(c);
+
+ free_wbufs(c);
+ free_orphans(c);
+ ubifs_lpt_free(c, 0);
+
+ while (!list_empty(&c->unclean_leb_list)) {
+ struct ubifs_unclean_leb *ucleb;
+
+ ucleb = list_entry(c->unclean_leb_list.next,
+ struct ubifs_unclean_leb, list);
+ list_del(&ucleb->list);
+ kfree(ucleb);
+ }
+
+ while (!list_empty(&c->old_buds)) {
+ struct ubifs_bud *bud;
+
+ bud = list_entry(c->old_buds.next, struct ubifs_bud, list);
+ list_del(&bud->list);
+ kfree(bud);
+ }
+
+ kfree(c->rcvrd_mst_node);
+ kfree(c->mst_node);
+ vfree(c->sbuf);
+ UBIFS_DBG(vfree(c->dbg_buf));
+ vfree(c->ileb_buf);
+ dbg_failure_mode_deregistration(c);
+}
+
+/**
+ * ubifs_remount_rw - re-mount in read-write mode.
+ * @c: UBIFS file-system description object
+ *
+ * UBIFS avoids allocating many unnecessary resources when mounted in read-only
+ * mode. This function allocates the needed resources and re-mounts UBIFS in
+ * read-write mode.
+ */
+int ubifs_remount_rw(struct ubifs_info *c)
+{
+ int err, lnum;
+
+ if (c->ro_media)
+ return -EINVAL;
+
+ mutex_lock(&c->umount_mutex);
+ c->remounting_rw = 1;
+
+ /* Check for enough free space */
+ if (ubifs_calc_available(c) <= 0) {
+ ubifs_err("insufficient available space");
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (c->old_leb_cnt != c->leb_cnt) {
+ struct ubifs_sb_node *sup;
+
+ sup = ubifs_read_sb_node(c);
+ if (IS_ERR(sup)) {
+ err = PTR_ERR(sup);
+ goto out;
+ }
+ sup->leb_cnt = cpu_to_le32(c->leb_cnt);
+ err = ubifs_write_sb_node(c, sup);
+ if (err)
+ goto out;
+ }
+
+ if (c->need_recovery) {
+ ubifs_msg("completing deferred recovery");
+ err = ubifs_write_rcvrd_mst_node(c);
+ if (err)
+ goto out;
+ err = ubifs_recover_size(c);
+ if (err)
+ goto out;
+ err = ubifs_clean_lebs(c, c->sbuf);
+ if (err)
+ goto out;
+ err = ubifs_recover_inl_heads(c, c->sbuf);
+ if (err)
+ goto out;
+ }
+
+ if (!(c->mst_node->flags & cpu_to_le32(UBIFS_MST_DIRTY))) {
+ c->mst_node->flags |= cpu_to_le32(UBIFS_MST_DIRTY);
+ err = ubifs_write_master(c);
+ if (err)
+ goto out;
+ }
+
+ c->ileb_buf = vmalloc(c->leb_size);
+ if (!c->ileb_buf) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ err = ubifs_lpt_init(c, 0, 1);
+ if (err)
+ goto out;
+
+ err = alloc_wbufs(c);
+ if (err)
+ goto out;
+
+ ubifs_create_buds_lists(c);
+
+ /* Create background thread */
+ c->bgt = kthread_create(ubifs_bg_thread, c, c->bgt_name);
+ if (!c->bgt)
+ c->bgt = ERR_PTR(-EINVAL);
+ if (IS_ERR(c->bgt)) {
+ err = PTR_ERR(c->bgt);
+ c->bgt = NULL;
+ ubifs_err("cannot spawn \"%s\", error %d",
+ c->bgt_name, err);
+ return err;
+ }
+
+ if (c->need_recovery)
+ err = ubifs_recover_gc_lnum(c);
+ else
+ err = care_about_gc_lnum(c);
+ if (err)
+ goto out;
+
+ err = ubifs_mount_orphans(c, c->need_recovery);
+ if (err)
+ goto out;
+ /* Check for enough log space */
+ lnum = c->lhead_lnum + 1;
+ if (lnum >= UBIFS_LOG_LNUM + c->log_lebs)
+ lnum = UBIFS_LOG_LNUM;
+ if (lnum == c->ltail_lnum) {
+ err = ubifs_consolidate_log(c);
+ if (err)
+ goto out;
+ }
+
+ if (c->need_recovery) {
+ c->need_recovery = 0;
+ ubifs_msg("deferred recovery completed");
+ }
+
+ dbg_gen("re-mounted read-write");
+ c->vfs_sb->s_flags &= ~MS_RDONLY;
+ c->remounting_rw = 0;
+ mutex_unlock(&c->umount_mutex);
+ return 0;
+
+out:
+ free_orphans(c);
+ if (c->bgt) {
+ kthread_stop(c->bgt);
+ c->bgt = NULL;
+ }
+ free_wbufs(c);
+ vfree(c->ileb_buf);
+ c->ileb_buf = NULL;
+ ubifs_lpt_free(c, 1);
+ c->remounting_rw = 0;
+ mutex_unlock(&c->umount_mutex);
+ return err;
+}
+
+/**
+ * commit_on_unmount - commit the journal when un-mounting.
+ * @c: UBIFS file-system description object
+ *
+ * This function is called during un-mounting and it commits the journal unless
+ * the "fast unmount" mode is enabled. It also avoids committing the journal if
+ * it contains too few data.
+ *
+ * Sometimes recovery requires the journal to be committed at least once, and
+ * this function takes care about this.
+ */
+static void commit_on_unmount(struct ubifs_info *c)
+{
+ if (!c->fast_unmount) {
+ long long bud_bytes;
+
+ spin_lock(&c->buds_lock);
+ bud_bytes = c->bud_bytes;
+ spin_unlock(&c->buds_lock);
+ if (bud_bytes > c->leb_size)
+ ubifs_run_commit(c);
+ }
+
+ if (c->recovery_needs_commit)
+ ubifs_recovery_commit(c);
+}
+
+/**
+ * ubifs_remount_ro - re-mount in read-only mode.
+ * @c: UBIFS file-system description object
+ *
+ * We rely on VFS to have stopped writing. Possibly the background thread could
+ * be running a commit, however kthread_stop will wait in that case.
+ */
+void ubifs_remount_ro(struct ubifs_info *c)
+{
+ int i, err;
+
+ ubifs_assert(!c->need_recovery);
+
+ commit_on_unmount(c);
+
+ mutex_lock(&c->umount_mutex);
+ if (c->bgt) {
+ kthread_stop(c->bgt);
+ c->bgt = NULL;
+ }
+
+ for (i = 0; i < c->jhead_cnt; i++) {
+ ubifs_wbuf_sync(&c->jheads[i].wbuf);
+ del_timer_sync(&c->jheads[i].wbuf.timer);
+ }
+
+ if (!c->ro_media) {
+ c->mst_node->flags &= ~cpu_to_le32(UBIFS_MST_DIRTY);
+ c->mst_node->flags |= cpu_to_le32(UBIFS_MST_NO_ORPHS);
+ c->mst_node->gc_lnum = cpu_to_le32(c->gc_lnum);
+ err = ubifs_write_master(c);
+ if (err)
+ ubifs_ro_mode(c);
+ }
+
+ ubifs_destroy_idx_gc(c);
+ free_wbufs(c);
+ free_orphans(c);
+ vfree(c->ileb_buf);
+ c->ileb_buf = NULL;
+ ubifs_lpt_free(c, 1);
+ mutex_unlock(&c->umount_mutex);
+}
+
+/**
+ * open_ubi - parse UBI device name string and open the UBI device.
+ * @c: UBIFS file-system description object
+ * @name: UBI volume name
+ * @mode: UBI volume open mode
+ *
+ * There are several ways to specify UBI volumes when mounting UBIFS:
+ * o ubiX_Y - UBI device number X, volume Y;
+ * o ubiY - UBI device number 0, volume Y;
+ * o ubiX:NAME - mount UBI device X, volume with name NAME;
+ * o ubi:NAME - mount UBI device 0, volume with name NAME.
+ *
+ * Alternative '!' separator may be used instead of ':' (because some shells
+ * like busybox may interpret ':' as an NFS host name separator). This function
+ * returns zero in case of success and a negative error code in case of
+ * failure.
+ */
+static int open_ubi(struct ubifs_info *c, const char *name, int mode)
+{
+ int dev, vol;
+ char *endptr;
+
+ if (name[0] != 'u' || name[1] != 'b' || name[2] != 'i')
+ return -EINVAL;
+
+ if ((name[3] == ':' || name[3] == '!') && name[4] != '\0') {
+ /* ubi:NAME method */
+ c->ubi = ubi_open_volume_nm(0, name + 4, mode);
+ if (IS_ERR(c->ubi))
+ return PTR_ERR(c->ubi);
+ } else if (isdigit(name[3])) {
+ dev = simple_strtoul(name + 3, &endptr, 0);
+ if (*endptr == '\0') {
+ /* ubiY method */
+ c->ubi = ubi_open_volume(0, dev, mode);
+ if (IS_ERR(c->ubi))
+ return PTR_ERR(c->ubi);
+ } else if (*endptr == '_' && isdigit(endptr[1])) {
+ /* ubiX_Y method */
+ vol = simple_strtoul(endptr + 1, &endptr, 0);
+ if (*endptr != '\0')
+ return -EINVAL;
+ c->ubi = ubi_open_volume(dev, vol, mode);
+ if (IS_ERR(c->ubi))
+ return PTR_ERR(c->ubi);
+ } else if ((*endptr == ':' || *endptr == '!') &&
+ endptr[1] != '\0') {
+ /* ubiX:NAME method */
+ c->ubi = ubi_open_volume_nm(dev, ++endptr, mode);
+ if (IS_ERR(c->ubi))
+ return PTR_ERR(c->ubi);
+ }
+ }
+
+ if (!c->ubi)
+ return -EINVAL;
+
+ ubi_get_volume_info(c->ubi, &c->vi);
+ ubi_get_device_info(c->vi.ubi_num, &c->di);
+ return 0;
+}
+
+static int sb_test(struct super_block *sb, void *data)
+{
+ dev_t *dev = data;
+
+ return sb->s_dev == *dev;
+}
+
+static int sb_set(struct super_block *sb, void *data)
+{
+ return 0;
+}
+
+/*
+ * UBIFS mount options.
+ *
+ * Opt_fast_unmount: do not run a journal commit before un-mounting
+ * Opt_norm_unmount: run a journal commit before un-mounting
+ * Opt_err: just end of array marker
+ */
+enum {
+ Opt_fast_unmount,
+ Opt_norm_unmount,
+ Opt_err,
+};
+
+static match_table_t tokens = {
+ {Opt_fast_unmount, "fast_unmount"},
+ {Opt_norm_unmount, "norm_unmount"},
+ {Opt_err, NULL},
+};
+
+/**
+ * ubifs_parse_options - parse mount parameters.
+ * @c: UBIFS file-system description object
+ * @options: parameters to parse
+ * @is_remount: non-zero if this is FS re-mount
+ *
+ * This function parses UBIFS mount options and returns zero in case success
+ * and a negative error code in case of failure.
+ */
+int ubifs_parse_options(struct ubifs_info *c, char *options, int is_remount)
+{
+ char *p;
+ substring_t args[MAX_OPT_ARGS];
+
+ if (!options)
+ return 0;
+
+ while ((p = strsep(&options, ",")) != NULL) {
+ int token;
+
+ if (!*p)
+ continue;
+
+ token = match_token(p, tokens, args);
+ switch (token) {
+ case Opt_fast_unmount:
+ c->mount_opts.unmount_mode = 2;
+ c->fast_unmount = 1;
+ break;
+ case Opt_norm_unmount:
+ c->mount_opts.unmount_mode = 1;
+ c->fast_unmount = 0;
+ break;
+ default:
+ ubifs_err("unrecognized mount option \"%s\" "
+ "or missing value", p);
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
+static int ubifs_get_sb(struct file_system_type *fs_type, int flags,
+ const char *name, void *data, struct vfsmount *mnt)
+{
+ int err;
+ struct super_block *sb;
+ struct ubifs_info *c;
+ struct inode *root;
+
+ dbg_gen("name %s, flags %#x", name, flags);
+
+ c = kzalloc(sizeof(struct ubifs_info), GFP_KERNEL);
+ if (!c)
+ return -ENOMEM;
+
+ spin_lock_init(&c->cnt_lock);
+ spin_lock_init(&c->cs_lock);
+ spin_lock_init(&c->buds_lock);
+ spin_lock_init(&c->space_lock);
+ spin_lock_init(&c->orphan_lock);
+ init_rwsem(&c->commit_sem);
+ mutex_init(&c->lp_mutex);
+ mutex_init(&c->tnc_mutex);
+ mutex_init(&c->log_mutex);
+ mutex_init(&c->mst_mutex);
+ mutex_init(&c->umount_mutex);
+ init_waitqueue_head(&c->cmt_wq);
+ c->buds = RB_ROOT;
+ c->old_idx = RB_ROOT;
+ c->size_tree = RB_ROOT;
+ c->orph_tree = RB_ROOT;
+ INIT_LIST_HEAD(&c->infos_list);
+ INIT_LIST_HEAD(&c->idx_gc);
+ INIT_LIST_HEAD(&c->replay_list);
+ INIT_LIST_HEAD(&c->replay_buds);
+ INIT_LIST_HEAD(&c->uncat_list);
+ INIT_LIST_HEAD(&c->empty_list);
+ INIT_LIST_HEAD(&c->freeable_list);
+ INIT_LIST_HEAD(&c->frdi_idx_list);
+ INIT_LIST_HEAD(&c->unclean_leb_list);
+ INIT_LIST_HEAD(&c->old_buds);
+ INIT_LIST_HEAD(&c->orph_list);
+ INIT_LIST_HEAD(&c->orph_new);
+
+ c->highest_inum = UBIFS_FIRST_INO;
+ get_random_bytes(&c->vfs_gen, sizeof(int));
+ c->lhead_lnum = c->ltail_lnum = UBIFS_LOG_LNUM;
+
+ err = ubifs_parse_options(c, data, 0);
+ if (err)
+ goto out_free;
+
+ /*
+ * Get UBI device number and volume ID. Mount it read-only so far
+ * because this might be a new mount point, and UBI allows only one
+ * read-write user at a time.
+ */
+ err = open_ubi(c, name, UBI_READONLY);
+ if (err) {
+ ubifs_err("cannot open \"%s\", error %d", name, err);
+ goto out_free;
+ }
+
+ dbg_gen("opened ubi%d_%d", c->vi.ubi_num, c->vi.vol_id);
+
+ sb = sget(fs_type, &sb_test, &sb_set, &c->vi.cdev);
+ if (IS_ERR(sb)) {
+ err = PTR_ERR(sb);
+ goto out_close;
+ }
+
+ if (sb->s_root) {
+ /* A new mount point for already mounted UBIFS */
+ dbg_gen("this ubi volume is already mounted");
+ err = simple_set_mnt(mnt, sb);
+ goto out_close;
+ }
+
+ /* Re-open the UBI device in read-write mode */
+ ubi_close_volume(c->ubi);
+ c->ubi = ubi_open_volume(c->vi.ubi_num, c->vi.vol_id, UBI_READWRITE);
+ if (IS_ERR(c->ubi)) {
+ err = PTR_ERR(c->ubi);
+ goto out_free;
+ }
+
+ c->vfs_sb = sb;
+ sb->s_fs_info = c;
+ sb->s_magic = UBIFS_SUPER_MAGIC;
+ sb->s_blocksize = UBIFS_BLOCK_SIZE;
+ sb->s_blocksize_bits = UBIFS_BLOCK_SHIFT;
+ sb->s_dev = c->vi.cdev;
+ sb->s_maxbytes = c->max_inode_sz =
+ min_t(uint64_t, MAX_LFS_FILESIZE, UBIFS_MAX_INODE_SZ);
+ sb->s_op = &ubifs_super_operations;
+ sb->s_flags = flags;
+
+ mutex_lock(&c->umount_mutex);
+ err = mount_ubifs(c);
+ if (err) {
+ ubifs_assert(err < 0);
+ goto out_umount;
+ }
+
+ /* Read the root inode */
+ root = ubifs_iget(sb, UBIFS_ROOT_INO);
+ if (IS_ERR(root)) {
+ err = PTR_ERR(root);
+ goto out_umount;
+ }
+
+ sb->s_root = d_alloc_root(root);
+ if (!sb->s_root)
+ goto out_iput;
+
+ mutex_unlock(&c->umount_mutex);
+
+ /* We do not support atime */
+ sb->s_flags |= MS_ACTIVE | MS_NOATIME;
+ return simple_set_mnt(mnt, sb);
+
+out_iput:
+ iput(root);
+out_umount:
+ spin_lock(&ubifs_infos_lock);
+ if (c->infos_list.next)
+ list_del(&c->infos_list);
+ spin_unlock(&ubifs_infos_lock);
+ ubifs_umount(c);
+ mutex_unlock(&c->umount_mutex);
+ up_write(&sb->s_umount);
+ sb->s_root = NULL;
+ deactivate_super(sb);
+out_close:
+ ubi_close_volume(c->ubi);
+out_free:
+ kfree(c);
+ return err;
+}
+
+static void ubifs_kill_sb(struct super_block *sb)
+{
+ struct ubifs_info *c = sb->s_fs_info;
+
+ if (sb->s_root != NULL && !(sb->s_flags & MS_RDONLY))
+ commit_on_unmount(c);
+ /* The un-mount routine is actually done in put_super() */
+ generic_shutdown_super(sb);
+}
+
+static struct file_system_type ubifs_fs_type = {
+ .name = "ubifs",
+ .owner = THIS_MODULE,
+ .get_sb = ubifs_get_sb,
+ .kill_sb = ubifs_kill_sb
+};
+
+/*
+ * Inode slab cache constructor.
+ */
+static void inode_slab_ctor(struct kmem_cache *cachep, void *obj)
+{
+ struct ubifs_inode *inode = obj;
+ inode_init_once(&inode->vfs_inode);
+}
+
+static int __init ubifs_init(void)
+{
+ int err;
+
+ BUILD_BUG_ON(sizeof(struct ubifs_ch) != 24);
+
+ /* Make sure node sizes are 8-byte aligned */
+ BUILD_BUG_ON(UBIFS_CH_SZ & 7);
+ BUILD_BUG_ON(UBIFS_INO_NODE_SZ & 7);
+ BUILD_BUG_ON(UBIFS_DENT_NODE_SZ & 7);
+ BUILD_BUG_ON(UBIFS_XENT_NODE_SZ & 7);
+ BUILD_BUG_ON(UBIFS_DATA_NODE_SZ & 7);
+ BUILD_BUG_ON(UBIFS_TRUN_NODE_SZ & 7);
+ BUILD_BUG_ON(UBIFS_SB_NODE_SZ & 7);
+ BUILD_BUG_ON(UBIFS_MST_NODE_SZ & 7);
+ BUILD_BUG_ON(UBIFS_REF_NODE_SZ & 7);
+ BUILD_BUG_ON(UBIFS_IDX_NODE_SZ & 7);
+ BUILD_BUG_ON(UBIFS_CS_NODE_SZ & 7);
+ BUILD_BUG_ON(UBIFS_ORPH_NODE_SZ & 7);
+
+ BUILD_BUG_ON(UBIFS_MAX_DENT_NODE_SZ & 7);
+ BUILD_BUG_ON(UBIFS_MAX_XENT_NODE_SZ & 7);
+ BUILD_BUG_ON(UBIFS_MAX_DATA_NODE_SZ & 7);
+ BUILD_BUG_ON(UBIFS_MAX_INO_NODE_SZ & 7);
+ BUILD_BUG_ON(UBIFS_MAX_NODE_SZ & 7);
+ BUILD_BUG_ON(MIN_WRITE_SZ & 7);
+
+ /* Check min. node size */
+ BUILD_BUG_ON(UBIFS_INO_NODE_SZ < MIN_WRITE_SZ);
+ BUILD_BUG_ON(UBIFS_DENT_NODE_SZ < MIN_WRITE_SZ);
+ BUILD_BUG_ON(UBIFS_XENT_NODE_SZ < MIN_WRITE_SZ);
+ BUILD_BUG_ON(UBIFS_TRUN_NODE_SZ < MIN_WRITE_SZ);
+
+ BUILD_BUG_ON(UBIFS_MAX_DENT_NODE_SZ > UBIFS_MAX_NODE_SZ);
+ BUILD_BUG_ON(UBIFS_MAX_XENT_NODE_SZ > UBIFS_MAX_NODE_SZ);
+ BUILD_BUG_ON(UBIFS_MAX_DATA_NODE_SZ > UBIFS_MAX_NODE_SZ);
+ BUILD_BUG_ON(UBIFS_MAX_INO_NODE_SZ > UBIFS_MAX_NODE_SZ);
+
+ /* We do not support multiple pages per block ATM */
+ BUILD_BUG_ON(UBIFS_BLOCK_SIZE != PAGE_CACHE_SIZE);
+
+ /* Defined node sizes */
+ BUILD_BUG_ON(UBIFS_SB_NODE_SZ != 4096);
+ BUILD_BUG_ON(UBIFS_MST_NODE_SZ != 512);
+ BUILD_BUG_ON(UBIFS_INO_NODE_SZ != 160);
+ BUILD_BUG_ON(UBIFS_REF_NODE_SZ != 64);
+
+ err = bdi_init(&ubifs_backing_dev_info);
+ if (err)
+ return err;
+
+ err = register_filesystem(&ubifs_fs_type);
+ if (err) {
+ ubifs_err("cannot register file system, error %d", err);
+ goto out;
+ }
+
+ err = -ENOMEM;
+ ubifs_inode_slab = kmem_cache_create("ubifs_inode_slab",
+ sizeof(struct ubifs_inode), 0,
+ SLAB_MEM_SPREAD | SLAB_RECLAIM_ACCOUNT,
+ &inode_slab_ctor);
+ if (!ubifs_inode_slab)
+ goto out_reg;
+
+ register_shrinker(&ubifs_shrinker_info);
+ dbg_mempressure_init();
+
+ err = ubifs_compressors_init();
+ if (err)
+ goto out_compr;
+
+ return 0;
+
+out_compr:
+ dbg_mempressure_exit();
+ unregister_shrinker(&ubifs_shrinker_info);
+ kmem_cache_destroy(ubifs_inode_slab);
+out_reg:
+ unregister_filesystem(&ubifs_fs_type);
+out:
+ bdi_destroy(&ubifs_backing_dev_info);
+ return err;
+}
+/* late_initcall to let compressors initialize first */
+late_initcall(ubifs_init);
+
+static void __exit ubifs_exit(void)
+{
+ ubifs_assert(list_empty(&ubifs_infos));
+ ubifs_assert(atomic_long_read(&ubifs_clean_zn_cnt) == 0);
+
+ ubifs_compressors_exit();
+ dbg_mempressure_exit();
+ unregister_shrinker(&ubifs_shrinker_info);
+ kmem_cache_destroy(ubifs_inode_slab);
+ unregister_filesystem(&ubifs_fs_type);
+ bdi_destroy(&ubifs_backing_dev_info);
+ dbg_leak_report();
+}
+module_exit(ubifs_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_VERSION(__stringify(UBIFS_VERSION));
+MODULE_AUTHOR("Artem Bityutskiy, Adrian Hunter");
+MODULE_DESCRIPTION("UBIFS - UBI File System");
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
new file mode 100644
index 0000000..a17783e
--- /dev/null
+++ b/fs/ubifs/super.c
@@ -0,0 +1,531 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/* This file implements VFS superblock operations */
+
+#include <linux/kthread.h>
+#include <linux/seq_file.h>
+#include <linux/mount.h>
+#include "ubifs.h"
+
+/**
+ * validate_inode - validate inode.
+ * @c: UBIFS file-system description object
+ * @inode: the inode to validate
+ *
+ * This is a helper function for 'ubifs_iget()' which validates various fields
+ * of a newly built inode to make sure they contain sane values and prevent
+ * possible vulnerabilities. Returns zero if the inode is all right and
+ * %-EINVAL if not.
+ */
+static int validate_inode(struct ubifs_info *c, const struct inode *inode)
+{
+ int err;
+ const struct ubifs_inode *ui = ubifs_inode(inode);
+
+ if (inode->i_size > c->max_inode_sz) {
+ ubifs_err("inode is too large (%lld)",
+ (long long)inode->i_size);
+ return -EINVAL;
+ }
+
+ if (ui->compr_type < 0 || ui->compr_type >= UBIFS_COMPR_TYPES_CNT) {
+ ubifs_err("unknown compression type %d", ui->compr_type);
+ return -EINVAL;
+ }
+
+ if (ui->xattr_cnt < 0) {
+ dbg_err("bad xattr_cnt %d", ui->xattr_cnt);
+ return -EINVAL;
+ }
+
+ if (ui->xattr_size < 0) {
+ dbg_err("bad xattr_size %lld", ui->xattr_size);
+ return -EINVAL;
+ }
+
+ if (ui->xattr_msize < 0 || ui->xattr_msize > ui->xattr_size) {
+ dbg_err("bad xattr_msize %lld", ui->xattr_msize);
+ return -EINVAL;
+ }
+
+ if (ui->xattr_names < 0 ||
+ ui->xattr_names + ui->xattr_cnt > XATTR_LIST_MAX) {
+ dbg_err("bad xattr_names %d or xattr_cnt %d",
+ ui->xattr_names, ui->xattr_cnt);
+ return -EINVAL;
+ }
+
+ if (ui->data_len < 0 || ui->data_len > UBIFS_MAX_INO_DATA) {
+ ubifs_err("invalid inode data length %d", ui->data_len);
+ return -EINVAL;
+ }
+
+ if (!ubifs_compr_present(ui->compr_type)) {
+ ubifs_warn("inode %lu uses '%s' compression, but it was not "
+ "compiled in", inode->i_ino,
+ ubifs_compr_name(ui->compr_type));
+ }
+
+ err = dbg_check_dir_size(c, inode);
+ return err;
+}
+
+struct inode *ubifs_iget(struct super_block *sb, unsigned long inum)
+{
+ int err;
+ union ubifs_key key;
+ struct ubifs_ino_node *ino;
+ struct ubifs_info *c = sb->s_fs_info;
+ struct inode *inode;
+ struct ubifs_inode *ui;
+
+ dbg_gen("inode %lu", inum);
+
+ inode = iget_locked(sb, inum);
+ if (!inode)
+ return ERR_PTR(-ENOMEM);
+ if (!(inode->i_state & I_NEW))
+ return inode;
+ ui = ubifs_inode(inode);
+
+ ino = kmalloc(UBIFS_MAX_INO_NODE_SZ, GFP_NOFS);
+ if (!ino) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ ino_key_init(c, &key, inode->i_ino);
+
+ err = ubifs_tnc_lookup(c, &key, ino);
+ if (err)
+ goto out_ino;
+
+ inode->i_flags |= (S_NOCMTIME | S_NOATIME);
+ inode->i_nlink = le32_to_cpu(ino->nlink);
+ inode->i_uid = le32_to_cpu(ino->uid);
+ inode->i_gid = le32_to_cpu(ino->gid);
+ inode->i_atime.tv_sec = le32_to_cpu(ino->atime);
+ inode->i_mtime.tv_sec = le32_to_cpu(ino->mtime);
+ inode->i_ctime.tv_sec = le32_to_cpu(ino->ctime);
+ inode->i_atime.tv_nsec = inode->i_mtime.tv_nsec =
+ inode->i_ctime.tv_nsec = 0;
+ inode->i_mode = le32_to_cpu(ino->mode);
+ inode->i_size = le64_to_cpu(ino->size);
+
+ ui->data_len = le32_to_cpu(ino->data_len);
+ ui->flags = le32_to_cpu(ino->flags);
+ ui->compr_type = le16_to_cpu(ino->compr_type);
+ ui->creat_sqnum = le64_to_cpu(ino->creat_sqnum);
+ ui->xattr_cnt = le32_to_cpu(ino->xattr_cnt);
+ ui->xattr_size = le64_to_cpu(ino->xattr_size);
+ ui->xattr_msize = le64_to_cpu(ino->xattr_msize);
+ ui->xattr_names = le32_to_cpu(ino->xattr_names);
+
+ err = validate_inode(c, inode);
+ if (err)
+ goto out_invalid;
+
+ switch (inode->i_mode & S_IFMT) {
+ case S_IFREG:
+ inode->i_mapping->a_ops = &ubifs_file_address_operations;
+ inode->i_op = &ubifs_file_inode_operations;
+ inode->i_fop = &ubifs_file_operations;
+ if (ui->data_len != 0)
+ goto out_invalid;
+ break;
+ case S_IFDIR:
+ inode->i_op = &ubifs_dir_inode_operations;
+ inode->i_fop = &ubifs_dir_operations;
+ if (ui->data_len != 0)
+ goto out_invalid;
+ break;
+ case S_IFLNK:
+ inode->i_op = &ubifs_symlink_inode_operations;
+ if (ui->data_len <= 0 || ui->data_len > UBIFS_MAX_INO_DATA) {
+ ubifs_err("invalid inode size");
+ goto out_invalid;
+ }
+ ui->data = kmalloc(ui->data_len + 1, GFP_KERNEL);
+ if (!ui->data) {
+ err = -ENOMEM;
+ goto out_ino;
+ }
+ memcpy(ui->data, ino->data, ui->data_len);
+ ((char *)ui->data)[ui->data_len] = '\0';
+ break;
+ case S_IFBLK:
+ case S_IFCHR:
+ {
+ dev_t rdev;
+ union ubifs_dev_desc *dev;
+
+ ui->data = kmalloc(sizeof(union ubifs_dev_desc), GFP_NOFS);
+ if (!ui->data) {
+ err = -ENOMEM;
+ goto out_ino;
+ }
+
+ dev = (union ubifs_dev_desc *)ino->data;
+ if (ui->data_len == sizeof(dev->new)) {
+ rdev = new_decode_dev(__le32_to_cpu(dev->new));
+ } else if (ui->data_len == sizeof(dev->huge)) {
+ rdev = huge_decode_dev(__le64_to_cpu(dev->huge));
+ } else {
+ ubifs_err("invalid inode size");
+ goto out_invalid;
+ }
+ inode->i_op = &ubifs_file_inode_operations;
+ init_special_inode(inode, inode->i_mode, rdev);
+ break;
+ }
+ case S_IFSOCK:
+ case S_IFIFO:
+ inode->i_op = &ubifs_file_inode_operations;
+ init_special_inode(inode, inode->i_mode, 0);
+ if (ui->data_len != 0)
+ goto out_invalid;
+ break;
+ default:
+ goto out_invalid;
+ }
+
+ kfree(ino);
+ ubifs_set_inode_flags(inode);
+ unlock_new_inode(inode);
+ return inode;
+
+out_invalid:
+ ubifs_err("inode %lu validation failed", inode->i_ino);
+ dbg_dump_node(c, ino);
+ err = -EINVAL;
+out_ino:
+ kfree(ino);
+out:
+ ubifs_err("failed to read inode %lu, error %d", inode->i_ino, err);
+ iget_failed(inode);
+ return ERR_PTR(err);
+}
+
+static struct inode *ubifs_alloc_inode(struct super_block *sb)
+{
+ struct ubifs_inode *ui;
+
+ ui = kmem_cache_alloc(ubifs_inode_slab, GFP_NOFS);
+ if (!ui)
+ return NULL;
+
+ memset((void *)ui + sizeof(struct inode), 0,
+ sizeof(struct ubifs_inode) - sizeof(struct inode));
+ mutex_init(&ui->budg_mutex);
+ return &ui->vfs_inode;
+};
+
+static void ubifs_destroy_inode(struct inode *inode)
+{
+ struct ubifs_inode *ui = ubifs_inode(inode);
+
+ kfree(ui->data);
+ kmem_cache_free(ubifs_inode_slab, inode);
+}
+
+static void ubifs_put_super(struct super_block *sb)
+{
+ int i;
+ struct ubifs_info *c = sb->s_fs_info;
+
+ ubifs_msg("un-mount UBI device %d, volume %d", c->vi.ubi_num,
+ c->vi.vol_id);
+ /*
+ * The following asserts are only valid if there has not been a failure
+ * of the media. For example, there will be dirty inodes if we failed
+ * to write them back because of I/O errors.
+ */
+ ubifs_assert(atomic_long_read(&c->dirty_pg_cnt) == 0);
+ ubifs_assert(atomic_long_read(&c->dirty_ino_cnt) == 0);
+ ubifs_assert(c->budg_idx_growth == 0);
+ ubifs_assert(c->budg_data_growth == 0);
+
+ /*
+ * The 'c->umount_lock' prevents races between UBIFS memory shrinker
+ * and file system un-mount. Namely, it prevents the shrinker from
+ * picking this superblock for shrinking - it will be just skipped if
+ * the mutex is locked.
+ */
+ mutex_lock(&c->umount_mutex);
+
+ spin_lock(&ubifs_infos_lock);
+ list_del(&c->infos_list);
+ spin_unlock(&ubifs_infos_lock);
+
+ if (!(c->vfs_sb->s_flags & MS_RDONLY)) {
+ /*
+ * First of all kill the background thread to make sure it does
+ * not interfere with un-mounting and freeing resources.
+ */
+ if (c->bgt) {
+ kthread_stop(c->bgt);
+ c->bgt = NULL;
+ }
+
+ /* Synchronize write-buffers */
+ if (c->jheads)
+ for (i = 0; i < c->jhead_cnt; i++) {
+ ubifs_wbuf_sync(&c->jheads[i].wbuf);
+ del_timer_sync(&c->jheads[i].wbuf.timer);
+ }
+
+ /*
+ * On fatal errors c->ro_media is set to 1, in which case we do
+ * not write the master node.
+ */
+ if (!c->ro_media) {
+ /*
+ * We are being cleanly unmounted which means the
+ * orphans were killed - indicate this in the master
+ * node. Also save the reserved GC LEB number.
+ */
+ int err;
+
+ c->mst_node->flags &= ~cpu_to_le32(UBIFS_MST_DIRTY);
+ c->mst_node->flags |= cpu_to_le32(UBIFS_MST_NO_ORPHS);
+ c->mst_node->gc_lnum = cpu_to_le32(c->gc_lnum);
+ err = ubifs_write_master(c);
+ if (err)
+ /*
+ * Recovery will attempt to fix the master area
+ * next mount, so we just print a message and
+ * continue to unmount normally.
+ */
+ ubifs_err("failed to write master node, "
+ "error %d", err);
+ }
+ }
+
+ ubifs_umount(c);
+ ubi_close_volume(c->ubi);
+ mutex_unlock(&c->umount_mutex);
+ kfree(c);
+}
+
+/*
+ * Note, Linux write-back code calls this without 'i_mutex'.
+ */
+static int ubifs_write_inode(struct inode *inode, int wait)
+{
+ int err;
+ struct ubifs_info *c = inode->i_sb->s_fs_info;
+ struct ubifs_inode *ui = ubifs_inode(inode);
+ struct ubifs_budget_req req = {.dd_growth = c->inode_budget,
+ .dirtied_ino_d = ui->data_len};
+
+ ubifs_assert(!(c->vfs_sb->s_flags & MS_RDONLY));
+ ubifs_assert(!ui->xattr);
+
+ if (is_bad_inode(inode))
+ return 0;
+
+ mutex_lock(&ui->budg_mutex);
+
+ /*
+ * Due to races between write-back forced by budgeting
+ * (see 'sync_some_inodes()') and pdflush write-back, the inode may
+ * have already been synchronized, do not do this again.
+ *
+ * This might also happen if it was synchronized in e.g. ubifs_link()',
+ * etc.
+ */
+ if (!ui->dirty) {
+ mutex_unlock(&ui->budg_mutex);
+ return 0;
+ }
+
+ ubifs_assert(ui->budgeted);
+ dbg_gen("inode %lu", inode->i_ino);
+
+ err = ubifs_jrn_write_inode(c, inode, 0, IS_SYNC(inode));
+ if (err)
+ ubifs_err("can't write inode %lu, error %d", inode->i_ino, err);
+
+ ui->dirty = 0;
+ UBIFS_DBG(ui->budgeted = 0);
+ atomic_long_dec(&c->dirty_ino_cnt);
+
+ ubifs_release_budget(c, &req);
+ mutex_unlock(&ui->budg_mutex);
+
+ return err;
+}
+
+static void ubifs_delete_inode(struct inode *inode)
+{
+ struct ubifs_info *c = inode->i_sb->s_fs_info;
+ struct ubifs_inode *ui = ubifs_inode(inode);
+ struct ubifs_budget_req req = {.dd_growth = c->inode_budget,
+ .dirtied_ino_d = ui->data_len};
+ int err;
+
+ if (ui->xattr) {
+ /*
+ * Extended attribute inode deletions are fully handled in
+ * 'ubifs_removexattr()'. These inodes are special and have
+ * limited usage, so there is nothing to do here.
+ */
+ ubifs_assert(!ui->dirty);
+ goto out;
+ }
+
+ dbg_gen("inode %lu", inode->i_ino);
+ ubifs_assert(!atomic_read(&inode->i_count));
+ ubifs_assert(inode->i_nlink == 0);
+
+ truncate_inode_pages(&inode->i_data, 0);
+ if (is_bad_inode(inode))
+ goto out;
+
+ mutex_lock(&ui->budg_mutex);
+
+ inode->i_size = 0;
+
+ err = ubifs_jrn_write_inode(c, inode, 1, IS_SYNC(inode));
+ if (err)
+ /*
+ * Worst case we have a lost orphan inode wasting space, so a
+ * simple error message is ok here.
+ */
+ ubifs_err("can't write inode %lu, error %d", inode->i_ino, err);
+
+ if (ui->dirty) {
+ ubifs_assert(ui->budgeted);
+ atomic_long_dec(&c->dirty_ino_cnt);
+ ui->dirty = 0;
+ UBIFS_DBG(ui->budgeted = 0);
+ ubifs_release_budget(c, &req);
+ }
+
+ mutex_unlock(&ui->budg_mutex);
+out:
+ clear_inode(inode);
+}
+
+static void ubifs_dirty_inode(struct inode *inode)
+{
+ struct ubifs_inode *ui = ubifs_inode(inode);
+
+ ubifs_assert(!(inode->i_sb->s_flags & MS_RDONLY));
+ ubifs_assert(mutex_is_locked(&ui->budg_mutex));
+
+ if (!ui->dirty) {
+ struct ubifs_info *c = inode->i_sb->s_fs_info;
+
+ ui->dirty = 1;
+ atomic_long_inc(&c->dirty_ino_cnt);
+ dbg_gen("inode %lu", inode->i_ino);
+ }
+}
+
+static int ubifs_statfs(struct dentry *dentry, struct kstatfs *buf)
+{
+ struct ubifs_info *c = dentry->d_sb->s_fs_info;
+ unsigned long long free;
+
+ free = ubifs_budg_get_free_space(c);
+ dbg_gen("free space %lld bytes (%lld blocks)",
+ free, free >> UBIFS_BLOCK_SHIFT);
+
+ buf->f_type = UBIFS_SUPER_MAGIC;
+ buf->f_bsize = UBIFS_BLOCK_SIZE;
+ buf->f_blocks = c->block_cnt;
+ buf->f_bfree = free >> UBIFS_BLOCK_SHIFT;
+ buf->f_bavail = buf->f_bfree;
+ buf->f_files = 0;
+ buf->f_ffree = 0;
+ buf->f_namelen = UBIFS_MAX_NLEN;
+
+ return 0;
+}
+
+static int ubifs_remount_fs(struct super_block *sb, int *flags, char *data)
+{
+ int err;
+ struct ubifs_info *c = sb->s_fs_info;
+
+ dbg_gen("old flags %#lx, new flags %#x", sb->s_flags, *flags);
+
+ err = ubifs_parse_options(c, data, 1);
+ if (err) {
+ ubifs_err("invalid or unknown remount parameter");
+ return err;
+ }
+ if ((sb->s_flags & MS_RDONLY) && !(*flags & MS_RDONLY)) {
+ err = ubifs_remount_rw(c);
+ if (err)
+ return err;
+ } else if (!(sb->s_flags & MS_RDONLY) && (*flags & MS_RDONLY))
+ ubifs_remount_ro(c);
+
+ return 0;
+}
+
+static int ubifs_show_options(struct seq_file *s, struct vfsmount *mnt)
+{
+ struct ubifs_info *c = mnt->mnt_sb->s_fs_info;
+
+ if (c->mount_opts.unmount_mode == 2)
+ seq_printf(s, ",fast_unmount");
+ else if (c->mount_opts.unmount_mode == 1)
+ seq_printf(s, ",norm_unmount");
+
+ return 0;
+}
+
+static int ubifs_sync_fs(struct super_block *sb, int wait)
+{
+ struct ubifs_info *c = sb->s_fs_info;
+ int i, ret = 0, err;
+
+ if (c->jheads)
+ for (i = 0; i < c->jhead_cnt; i++) {
+ err = ubifs_wbuf_sync(&c->jheads[i].wbuf);
+ if (err && !ret)
+ ret = err;
+ }
+ /*
+ * We ought to call sync for c->ubi but it does not have one. If it had
+ * it would in turn call mtd->sync, however mtd operations are
+ * synchronous anyway, so we don't lose any sleep here.
+ */
+ return ret;
+}
+
+struct super_operations ubifs_super_operations = {
+ .alloc_inode = ubifs_alloc_inode,
+ .destroy_inode = ubifs_destroy_inode,
+ .put_super = ubifs_put_super,
+ .write_inode = ubifs_write_inode,
+ .delete_inode = ubifs_delete_inode,
+ .statfs = ubifs_statfs,
+ .dirty_inode = ubifs_dirty_inode,
+ .remount_fs = ubifs_remount_fs,
+ .show_options = ubifs_show_options,
+ .sync_fs = ubifs_sync_fs,
+};
--
1.5.4.1

2008-03-27 13:11:29

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 01/26] VFS: introduce writeback_inodes_sb()

Let file systems to writeback their pages and inodes when needed. This
is needed for UBIFS budgeting sub-system because it has to force
write-back from time to time.

Note, it cannot be called if one of the dirty pages is locked by
the caller, otherwise it'll deadlock.

Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/fs-writeback.c | 8 ++++++++
include/linux/writeback.h | 1 +
2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index c007607..062aa4a 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -573,6 +573,14 @@ void sync_inodes_sb(struct super_block *sb, int wait)
spin_unlock(&inode_lock);
}

+void writeback_inodes_sb(struct super_block *sb, struct writeback_control *wbc)
+{
+ spin_lock(&inode_lock);
+ sync_sb_inodes(sb, wbc);
+ spin_unlock(&inode_lock);
+}
+EXPORT_SYMBOL_GPL(writeback_inodes_sb);
+
/*
* Rather lame livelock avoidance.
*/
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index b7b3362..0083a0a 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -71,6 +71,7 @@ struct writeback_control {
void writeback_inodes(struct writeback_control *wbc);
int inode_wait(void *);
void sync_inodes_sb(struct super_block *, int wait);
+void writeback_inodes_sb(struct super_block *sb, struct writeback_control *wbc);
void sync_inodes(int wait);

/* writeback.h requires fs.h; it, too, is not included from here. */
--
1.5.4.1

2008-03-27 13:12:36

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 11/26] UBIFS: add commit functionality

This is the UBIFS journal commit implementation. The journal commit does not
mean the data is physically moved anywhere - we just update the indexing
information and find new eraseblocks for the journal.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/commit.c | 708 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 708 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/commit.c b/fs/ubifs/commit.c
new file mode 100644
index 0000000..0b199d4
--- /dev/null
+++ b/fs/ubifs/commit.c
@@ -0,0 +1,708 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements functions that manage the running of the commit process.
+ * Each affected module has its own functions to accomplish their part in the
+ * commit and those functions are called here.
+ *
+ * The commit is the process whereby all updates to the index and LEB properties
+ * are written out together and the journal becomes empty. This keeps the
+ * file system consistent - at all times the state can be recreated by reading
+ * the index and LEB properties and then replaying the journal.
+ *
+ * The commit is split into two parts named "commit start" and "commit end".
+ * During commit start, the commit process has exclusive access to the journal
+ * by holding the commit semaphore down for writing. As few I/O operations as
+ * possible are performed during commit start, instead the nodes that are to be
+ * written are merely identified. During commit end, the commit semaphore is no
+ * longer held and the journal is again in operation, allowing users to continue
+ * to use the file system while the bulk of the commit I/O is performed. The
+ * purpose of this two-step approach is to prevent the commit from causing any
+ * latency blips. Note that in any case, the commit does not prevent lookups
+ * (as permitted by the TNC mutex), or access to VFS data structures e.g. page
+ * cache.
+ */
+
+#include <linux/freezer.h>
+#include <linux/kthread.h>
+#include "ubifs.h"
+
+/**
+ * do_commit - commit the journal.
+ * @c: UBIFS file-system description object
+ *
+ * This function implements UBIFS commit. It has to be called with commit lock
+ * locked. Returns zero in case of success and a negative error code in case of
+ * failure.
+ */
+static int do_commit(struct ubifs_info *c)
+{
+ int err, new_ltail_lnum, old_ltail_lnum, i;
+ struct ubifs_zbranch zroot;
+ struct ubifs_lp_stats lst;
+
+ dbg_cmt("start");
+ ubifs_assert(!(c->vfs_sb->s_flags & MS_RDONLY));
+
+ if (c->ro_media) {
+ err = -EROFS;
+ goto out_up;
+ }
+
+ c->recovery_needs_commit = 0;
+
+ /* Sync all write buffers (necessary for recovery) */
+ for (i = 0; i < c->jhead_cnt; i++) {
+ err = ubifs_wbuf_sync(&c->jheads[i].wbuf);
+ if (err)
+ goto out_up;
+ }
+
+ err = ubifs_gc_start_commit(c);
+ if (err)
+ goto out_up;
+ err = dbg_check_lprops(c);
+ if (err)
+ goto out_up;
+ err = ubifs_log_start_commit(c, &new_ltail_lnum);
+ if (err)
+ goto out_up;
+ err = ubifs_tnc_start_commit(c, &zroot);
+ if (err)
+ goto out_up;
+ err = ubifs_lpt_start_commit(c);
+ if (err)
+ goto out_up;
+ err = ubifs_orphan_start_commit(c);
+ if (err)
+ goto out_up;
+
+ ubifs_get_lp_stats(c, &lst);
+
+ up_write(&c->commit_sem);
+
+ err = ubifs_tnc_end_commit(c);
+ if (err)
+ goto out;
+ err = ubifs_lpt_end_commit(c);
+ if (err)
+ goto out;
+ err = ubifs_orphan_end_commit(c);
+ if (err)
+ goto out;
+ old_ltail_lnum = c->ltail_lnum;
+ err = ubifs_log_end_commit(c, new_ltail_lnum);
+ if (err)
+ goto out;
+ err = dbg_check_old_index(c, &zroot);
+ if (err)
+ goto out;
+
+ mutex_lock(&c->mst_mutex);
+ c->mst_node->cmt_no = cpu_to_le64(++c->cmt_no);
+ c->mst_node->log_lnum = cpu_to_le32(new_ltail_lnum);
+ c->mst_node->root_lnum = cpu_to_le32(zroot.lnum);
+ c->mst_node->root_offs = cpu_to_le32(zroot.offs);
+ c->mst_node->root_len = cpu_to_le32(zroot.len);
+ c->mst_node->ihead_lnum = cpu_to_le32(c->ihead_lnum);
+ c->mst_node->ihead_offs = cpu_to_le32(c->ihead_offs);
+ c->mst_node->index_size = cpu_to_le64(c->old_idx_sz);
+ c->mst_node->lpt_lnum = cpu_to_le32(c->lpt_lnum);
+ c->mst_node->lpt_offs = cpu_to_le32(c->lpt_offs);
+ c->mst_node->nhead_lnum = cpu_to_le32(c->nhead_lnum);
+ c->mst_node->nhead_offs = cpu_to_le32(c->nhead_offs);
+ c->mst_node->ltab_lnum = cpu_to_le32(c->ltab_lnum);
+ c->mst_node->ltab_offs = cpu_to_le32(c->ltab_offs);
+ c->mst_node->lsave_lnum = cpu_to_le32(c->lsave_lnum);
+ c->mst_node->lsave_offs = cpu_to_le32(c->lsave_offs);
+ c->mst_node->lscan_lnum = cpu_to_le32(c->lscan_lnum);
+ c->mst_node->empty_lebs = cpu_to_le32(lst.empty_lebs);
+ c->mst_node->idx_lebs = cpu_to_le32(lst.idx_lebs);
+ c->mst_node->total_free = cpu_to_le64(lst.total_free);
+ c->mst_node->total_dirty = cpu_to_le64(lst.total_dirty);
+ c->mst_node->total_used = cpu_to_le64(lst.total_used);
+ c->mst_node->total_dead = cpu_to_le64(lst.total_dead);
+ c->mst_node->total_dark = cpu_to_le64(lst.total_dark);
+ if (c->no_orphs)
+ c->mst_node->flags |= cpu_to_le32(UBIFS_MST_NO_ORPHS);
+ else
+ c->mst_node->flags &= ~cpu_to_le32(UBIFS_MST_NO_ORPHS);
+ err = ubifs_write_master(c);
+ mutex_unlock(&c->mst_mutex);
+ if (err)
+ goto out;
+
+ err = ubifs_log_post_commit(c, old_ltail_lnum);
+ if (err)
+ goto out;
+ err = ubifs_gc_end_commit(c);
+ if (err)
+ goto out;
+ err = ubifs_lpt_post_commit(c);
+ if (err)
+ goto out;
+
+ spin_lock(&c->cs_lock);
+ c->cmt_state = COMMIT_RESTING;
+ wake_up(&c->cmt_wq);
+ dbg_cmt("commit end");
+ spin_unlock(&c->cs_lock);
+
+ return 0;
+
+out_up:
+ up_write(&c->commit_sem);
+out:
+ ubifs_err("commit failed, error %d", err);
+ spin_lock(&c->cs_lock);
+ c->cmt_state = COMMIT_BROKEN;
+ wake_up(&c->cmt_wq);
+ spin_unlock(&c->cs_lock);
+ ubifs_ro_mode(c);
+ return err;
+}
+
+/**
+ * run_bg_commit - run background commit if it is needed.
+ * @c: UBIFS file-system description object
+ *
+ * This function runs background commit if it is needed. Returns zero in case
+ * of success and a negative error code in case of failure.
+ */
+static int run_bg_commit(struct ubifs_info *c)
+{
+ spin_lock(&c->cs_lock);
+ /*
+ * Run background commit only if background commit was requested or if
+ * commit is required.
+ */
+ if (c->cmt_state != COMMIT_BACKGROUND &&
+ c->cmt_state != COMMIT_REQUIRED)
+ goto out;
+ spin_unlock(&c->cs_lock);
+
+ down_write(&c->commit_sem);
+ spin_lock(&c->cs_lock);
+ if (c->cmt_state == COMMIT_REQUIRED)
+ c->cmt_state = COMMIT_RUNNING_REQUIRED;
+ else if (c->cmt_state == COMMIT_BACKGROUND)
+ c->cmt_state = COMMIT_RUNNING_BACKGROUND;
+ else
+ goto out_cmt_unlock;
+ spin_unlock(&c->cs_lock);
+
+ return do_commit(c);
+
+out_cmt_unlock:
+ up_write(&c->commit_sem);
+out:
+ spin_unlock(&c->cs_lock);
+ return 0;
+}
+
+/**
+ * ubifs_bg_thread - UBIFS background thread function.
+ * @info: points to the file-system description object
+ *
+ * This function implements various file-system background activities:
+ * o when a write-buffer timer expires it synchronizes the appropriate
+ * write-buffer;
+ * o when the journal is about to be full, it starts in-advance commit.
+ */
+int ubifs_bg_thread(void *info)
+{
+ int err;
+ struct ubifs_info *c = info;
+
+ ubifs_msg("background thread \"%s\" started, PID %d",
+ c->bgt_name, current->pid);
+
+ set_user_nice(current, 0);
+ set_freezable();
+
+ while (1) {
+ if (kthread_should_stop())
+ break;
+
+ if (try_to_freeze())
+ continue;
+
+ c->need_bgt = 0;
+
+ err = ubifs_bg_wbufs_sync(c);
+ if (err)
+ ubifs_ro_mode(c);
+
+ run_bg_commit(c);
+
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (!c->need_bgt && !kthread_should_stop())
+ schedule();
+ __set_current_state(TASK_RUNNING);
+
+ cond_resched();
+ }
+
+ dbg_msg("background thread \"%s\" stops", c->bgt_name);
+ return 0;
+}
+
+/**
+ * ubifs_commit_required - set commit state to "required".
+ * @c: UBIFS file-system description object
+ *
+ * This function is called if a commit is required but cannot be done from the
+ * calling function, so it is just flagged instead.
+ */
+void ubifs_commit_required(struct ubifs_info *c)
+{
+ spin_lock(&c->cs_lock);
+ switch (c->cmt_state) {
+ case COMMIT_RESTING:
+ case COMMIT_BACKGROUND:
+ dbg_cmt("old: %s, new: %s", dbg_cstate(c->cmt_state),
+ dbg_cstate(COMMIT_REQUIRED));
+ c->cmt_state = COMMIT_REQUIRED;
+ break;
+ case COMMIT_RUNNING_BACKGROUND:
+ dbg_cmt("old: %s, new: %s", dbg_cstate(c->cmt_state),
+ dbg_cstate(COMMIT_RUNNING_REQUIRED));
+ c->cmt_state = COMMIT_RUNNING_REQUIRED;
+ break;
+ case COMMIT_REQUIRED:
+ case COMMIT_RUNNING_REQUIRED:
+ case COMMIT_BROKEN:
+ break;
+ }
+ spin_unlock(&c->cs_lock);
+}
+
+/**
+ * ubifs_request_bg_commit - notify the background thread to do a commit.
+ * @c: UBIFS file-system description object
+ *
+ * This function is called if the journal is full enough to make a commit
+ * worthwhile, so background thread is kicked to start it.
+ */
+void ubifs_request_bg_commit(struct ubifs_info *c)
+{
+ spin_lock(&c->cs_lock);
+ if (c->cmt_state == COMMIT_RESTING) {
+ dbg_cmt("old: %s, new: %s", dbg_cstate(c->cmt_state),
+ dbg_cstate(COMMIT_BACKGROUND));
+ c->cmt_state = COMMIT_BACKGROUND;
+ spin_unlock(&c->cs_lock);
+ ubifs_wake_up_bgt(c);
+ } else
+ spin_unlock(&c->cs_lock);
+}
+
+/**
+ * wait_for_commit - wait for commit.
+ * @c: UBIFS file-system description object
+ *
+ * This function sleeps until the commit operation is no longer running.
+ */
+static int wait_for_commit(struct ubifs_info *c)
+{
+ dbg_cmt("pid %d goes sleep", current->pid);
+
+ /*
+ * The following sleeps if the condition is false, and will be woken
+ * when the commit ends. It is possible, although very unlikely, that we
+ * will wake up and see the subsequent commit running, rather than the
+ * one we were waiting for, and go back to sleep. However, we will be
+ * woken again, so there is no danger of sleeping forever.
+ */
+ wait_event(c->cmt_wq, c->cmt_state != COMMIT_RUNNING_BACKGROUND &&
+ c->cmt_state != COMMIT_RUNNING_REQUIRED);
+ dbg_cmt("commit finished, pid %d woke up", current->pid);
+ return 0;
+}
+
+/**
+ * ubifs_run_commit - run or wait for commit.
+ * @c: UBIFS file-system description object
+ *
+ * This function runs commit and returns zero in case of success and a negative
+ * error code in case of failure.
+ */
+int ubifs_run_commit(struct ubifs_info *c)
+{
+ int err = 0;
+
+ spin_lock(&c->cs_lock);
+ if (c->cmt_state == COMMIT_BROKEN) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (c->cmt_state == COMMIT_RUNNING_BACKGROUND)
+ /*
+ * We set the commit state to 'running required' to indicate
+ * that we want it to complete as quickly as possible.
+ */
+ c->cmt_state = COMMIT_RUNNING_REQUIRED;
+
+ if (c->cmt_state == COMMIT_RUNNING_REQUIRED) {
+ spin_unlock(&c->cs_lock);
+ return wait_for_commit(c);
+ }
+ spin_unlock(&c->cs_lock);
+
+ /* Ok, the commit is indeed needed */
+
+ down_write(&c->commit_sem);
+ spin_lock(&c->cs_lock);
+ /*
+ * Since we unlocked 'c->cs_lock', the state may have changed, so
+ * re-check it.
+ */
+ if (c->cmt_state == COMMIT_BROKEN) {
+ err = -EINVAL;
+ goto out_cmt_unlock;
+ }
+
+ if (c->cmt_state == COMMIT_RUNNING_BACKGROUND)
+ c->cmt_state = COMMIT_RUNNING_REQUIRED;
+
+ if (c->cmt_state == COMMIT_RUNNING_REQUIRED) {
+ up_write(&c->commit_sem);
+ spin_unlock(&c->cs_lock);
+ return wait_for_commit(c);
+ }
+ c->cmt_state = COMMIT_RUNNING_REQUIRED;
+ spin_unlock(&c->cs_lock);
+
+ err = do_commit(c);
+ return err;
+
+out_cmt_unlock:
+ up_write(&c->commit_sem);
+out:
+ spin_unlock(&c->cs_lock);
+ return err;
+}
+
+/**
+ * ubifs_recovery_commit - if needed, ensure a commit has run since recovery.
+ * @c: UBIFS file-system description object
+ *
+ * This function ensures that a commit has been run since recovery and before
+ * unmounting cleanly. Errors are ignored because in that case a subsequent
+ * unmount will not be clean.
+ *
+ * The recovery needs a commit when it updates TNC directly without there being
+ * a corresponding record of the change in the journal. In that case, if UBIFS
+ * were to unmount cleanly without having run a commit, the TNC changes would
+ * be lost.
+ */
+void ubifs_recovery_commit(struct ubifs_info *c)
+{
+ spin_lock(&c->cs_lock);
+ if (!c->recovery_needs_commit ||
+ c->cmt_state == COMMIT_BROKEN ||
+ c->cmt_state == COMMIT_RUNNING_BACKGROUND ||
+ c->cmt_state == COMMIT_RUNNING_REQUIRED) {
+ spin_unlock(&c->cs_lock);
+ return;
+ }
+ spin_unlock(&c->cs_lock);
+ down_write(&c->commit_sem);
+ spin_lock(&c->cs_lock);
+ if (!c->recovery_needs_commit ||
+ c->cmt_state == COMMIT_BROKEN ||
+ c->cmt_state == COMMIT_RUNNING_BACKGROUND ||
+ c->cmt_state == COMMIT_RUNNING_REQUIRED) {
+ spin_unlock(&c->cs_lock);
+ up_write(&c->commit_sem);
+ return;
+ }
+ c->cmt_state = COMMIT_RUNNING_REQUIRED;
+ spin_unlock(&c->cs_lock);
+ do_commit(c);
+}
+
+/**
+ * ubifs_gc_should_commit - determine if it is time for GC to run commit.
+ * @c: UBIFS file-system description object
+ *
+ * This function is called by garbage collection to determine if commit should
+ * be run. If commit state is @COMMIT_BACKGROUND, which means that the journal
+ * is full enough to start commit, this function returns true. It is not
+ * absolutely necessary to commit yet, but it feels like this should be better
+ * then to keep doing GC. This function returns %1 if GC has to initiate commit
+ * and %0 if not.
+ */
+int ubifs_gc_should_commit(struct ubifs_info *c)
+{
+ int ret = 0;
+
+ spin_lock(&c->cs_lock);
+ if (c->cmt_state == COMMIT_BACKGROUND) {
+ dbg_cmt("commit required now");
+ c->cmt_state = COMMIT_REQUIRED;
+ } else
+ dbg_cmt("commit not requested");
+ if (c->cmt_state == COMMIT_REQUIRED)
+ ret = 1;
+ spin_unlock(&c->cs_lock);
+ return ret;
+}
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_OLD_IDX
+
+/**
+ * struct idx_node - hold index nodes during index tree traversal.
+ * @list: list
+ * @iip: index in parent (slot number of this indexing node in the parent
+ * indexing node)
+ * @upper_key: all keys in this indexing node have to be less or equivalent to
+ * this key
+ * @idx: index node (8-byte aligned because all node structures must be 8-byte
+ * aligned)
+ */
+struct idx_node {
+ struct list_head list;
+ int iip;
+ union ubifs_key upper_key;
+ struct ubifs_idx_node idx __attribute__((aligned(8)));
+};
+
+/**
+ * dbg_old_index_check_init - get information for the next old index check.
+ * @c: UBIFS file-system description object
+ * @zroot: root of the index
+ *
+ * This function records information about the index that will be needed for the
+ * next old index check i.e. 'dbg_check_old_index()'.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int dbg_old_index_check_init(struct ubifs_info *c, struct ubifs_zbranch *zroot)
+{
+ struct ubifs_idx_node *idx;
+ int lnum, offs, len, err = 0;
+
+ c->old_zroot = *zroot;
+
+ lnum = c->old_zroot.lnum;
+ offs = c->old_zroot.offs;
+ len = c->old_zroot.len;
+
+ idx = kmalloc(c->max_idx_node_sz, GFP_KERNEL);
+ if (!idx)
+ return -ENOMEM;
+
+ err = ubifs_read_node(c, idx, UBIFS_IDX_NODE, len, lnum, offs);
+ if (err)
+ goto out;
+
+ c->old_zroot_level = le16_to_cpu(idx->level);
+ c->old_zroot_sqnum = le64_to_cpu(idx->ch.sqnum);
+out:
+ kfree(idx);
+ return err;
+}
+
+/**
+ * dbg_check_old_index - check the old copy of the index.
+ * @c: UBIFS file-system description object
+ * @zroot: root of the new index
+ *
+ * In order to be able to recover from an unclean unmount, a complete copy of
+ * the index must exist on flash. This is the "old" index. The commit process
+ * must write the "new" index to flash without overwriting or destroying any
+ * part of the old index. This function is run at commit end in order to check
+ * that the old index does indeed exist completely intact.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int dbg_check_old_index(struct ubifs_info *c, struct ubifs_zbranch *zroot)
+{
+ int lnum, offs, len, err = 0, uninitialized_var(last_level), child_cnt;
+ int first = 1, iip;
+ union ubifs_key lower_key, upper_key, l_key, u_key;
+ unsigned long long uninitialized_var(last_sqnum);
+ struct ubifs_idx_node *idx;
+ struct list_head list;
+ struct idx_node *i;
+ size_t sz;
+
+ INIT_LIST_HEAD(&list);
+
+ sz = sizeof(struct idx_node) + ubifs_idx_node_sz(c, c->fanout) -
+ UBIFS_IDX_NODE_SZ;
+
+ /* Start at the old zroot */
+ lnum = c->old_zroot.lnum;
+ offs = c->old_zroot.offs;
+ len = c->old_zroot.len;
+ iip = 0;
+
+ /*
+ * Traverse the index tree preorder depth-first i.e. do a node and then
+ * its subtrees from left to right.
+ */
+ while (1) {
+ struct ubifs_branch *br;
+
+ /* Get the next index node */
+ i = kmalloc(sz, GFP_NOFS);
+ if (!i) {
+ err = -ENOMEM;
+ goto out_free;
+ }
+ i->iip = iip;
+ /* Keep the index nodes on our path in a linked list */
+ list_add_tail(&i->list, &list);
+ /* Read the index node */
+ idx = &i->idx;
+ err = ubifs_read_node(c, idx, UBIFS_IDX_NODE, len, lnum, offs);
+ if (err)
+ goto out_free;
+ /* Validate index node */
+ child_cnt = le16_to_cpu(idx->child_cnt);
+ if (child_cnt < 1 || child_cnt > c->fanout) {
+ err = 1;
+ goto out_dump;
+ }
+ if (first) {
+ first = 0;
+ /* Check root level and sqnum */
+ if (le16_to_cpu(idx->level) != c->old_zroot_level) {
+ err = 2;
+ goto out_dump;
+ }
+ if (le64_to_cpu(idx->ch.sqnum) != c->old_zroot_sqnum) {
+ err = 3;
+ goto out_dump;
+ }
+ /* Set last values as though root had a parent */
+ last_level = le16_to_cpu(idx->level) + 1;
+ last_sqnum = le64_to_cpu(idx->ch.sqnum) + 1;
+ key_read(c, ubifs_idx_key(c, idx), &lower_key);
+ highest_ino_key(c, &upper_key, INUM_WATERMARK);
+ }
+ key_copy(c, &upper_key, &i->upper_key);
+ if (le16_to_cpu(idx->level) != last_level - 1) {
+ err = 3;
+ goto out_dump;
+ }
+ /*
+ * The index is always written bottom up hence a child's sqnum
+ * is always less than the parents.
+ */
+ if (le64_to_cpu(idx->ch.sqnum) >= last_sqnum) {
+ err = 4;
+ goto out_dump;
+ }
+ /* Check key range */
+ key_read(c, ubifs_idx_key(c, idx), &l_key);
+ br = ubifs_idx_branch(c, idx, child_cnt - 1);
+ key_read(c, &br->key, &u_key);
+ if (keys_cmp(c, &lower_key, &l_key) > 0) {
+ err = 5;
+ goto out_dump;
+ }
+ if (keys_cmp(c, &upper_key, &u_key) < 0) {
+ err = 6;
+ goto out_dump;
+ }
+ if (keys_cmp(c, &upper_key, &u_key) == 0)
+ if (!is_hash_key(c, &u_key)) {
+ err = 7;
+ goto out_dump;
+ }
+ /* Go to next index node */
+ if (le16_to_cpu(idx->level) == 0) {
+ /* At the bottom, so go up until can go right */
+ while (1) {
+ /* Drop the bottom of the list */
+ list_del(&i->list);
+ kfree(i);
+ /* No more list means we are done */
+ if (list_empty(&list))
+ goto out;
+ /* Look at the new bottom */
+ i = list_entry(list.prev, struct idx_node,
+ list);
+ idx = &i->idx;
+ /* Can we go right */
+ if (iip + 1 < le16_to_cpu(idx->child_cnt)) {
+ iip = iip + 1;
+ break;
+ } else
+ /* Nope, so go up again */
+ iip = i->iip;
+ }
+ } else
+ /* Go down left */
+ iip = 0;
+ /*
+ * We have the parent in 'idx' and now we set up for reading the
+ * child pointed to by slot 'iip'.
+ */
+ last_level = le16_to_cpu(idx->level);
+ last_sqnum = le64_to_cpu(idx->ch.sqnum);
+ br = ubifs_idx_branch(c, idx, iip);
+ lnum = le32_to_cpu(br->lnum);
+ offs = le32_to_cpu(br->offs);
+ len = le32_to_cpu(br->len);
+ key_read(c, &br->key, &lower_key);
+ if (iip + 1 < le16_to_cpu(idx->child_cnt)) {
+ br = ubifs_idx_branch(c, idx, iip + 1);
+ key_read(c, &br->key, &upper_key);
+ } else
+ key_copy(c, &i->upper_key, &upper_key);
+ }
+out:
+ err = dbg_old_index_check_init(c, zroot);
+ if (err)
+ goto out_free;
+
+ return 0;
+
+out_dump:
+ dbg_err("dumping index node (iip=%d)", i->iip);
+ dbg_dump_node(c, idx);
+ list_del(&i->list);
+ kfree(i);
+ if (!list_empty(&list)) {
+ i = list_entry(list.prev, struct idx_node, list);
+ dbg_err("dumping parent index node");
+ dbg_dump_node(c, &i->idx);
+ }
+out_free:
+ while (!list_empty(&list)) {
+ i = list_entry(list.next, struct idx_node, list);
+ list_del(&i->list);
+ kfree(i);
+ }
+ ubifs_err("failed, error %d", err);
+ if (err > 0)
+ err = -EINVAL;
+ return err;
+}
+
+#endif /* CONFIG_UBIFS_FS_DEBUG_CHK_OLD_IDX */
--
1.5.4.1

2008-03-27 13:12:51

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 08/26] UBIFS: add compression support

UBIFS supports on-the-flight compression, and this patch adds a
compression helper functions which make it possible to use the
same API irrespectively of the compression type. At the moment
UBIFS supports only LZO and zlib. It uses cryptoapi to access
the compressors.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/compress.c | 264 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 264 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/compress.c b/fs/ubifs/compress.c
new file mode 100644
index 0000000..74389f5
--- /dev/null
+++ b/fs/ubifs/compress.c
@@ -0,0 +1,264 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ * Copyright (C) 2006, 2007 University of Szeged, Hungary
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ * Zoltan Sogor
+ */
+
+/*
+ * This file provides a single place to access to compression and
+ * decompression.
+ */
+
+#include <linux/crypto.h>
+#include "ubifs.h"
+
+/*
+ * UBIFS does not try to compress data if its length is less then the below
+ * constant.
+ */
+#define MIN_COMPR_LEN 128
+
+/* Fake description object for the "none" compressor */
+static struct ubifs_compressor none_compr = {
+ .compr_type = UBIFS_COMPR_NONE,
+ .name = "no compression",
+ .capi_name = "",
+};
+
+#ifdef CONFIG_UBIFS_FS_LZO
+static DEFINE_MUTEX(lzo_mutex);
+
+static struct ubifs_compressor lzo_compr = {
+ .compr_type = UBIFS_COMPR_LZO,
+ .comp_mutex = &lzo_mutex,
+ .name = "LZO",
+ .capi_name = "lzo",
+};
+#else
+static struct ubifs_compressor lzo_compr = {
+ .compr_type = UBIFS_COMPR_LZO,
+ .name = "LZO",
+};
+#endif
+
+#ifdef CONFIG_UBIFS_FS_ZLIB
+static DEFINE_MUTEX(deflate_mutex);
+static DEFINE_MUTEX(inflate_mutex);
+
+static struct ubifs_compressor zlib_compr = {
+ .compr_type = UBIFS_COMPR_ZLIB,
+ .comp_mutex = &deflate_mutex,
+ .decomp_mutex = &inflate_mutex,
+ .name = "zlib",
+ .capi_name = "deflate",
+};
+#else
+static struct ubifs_compressor zlib_compr = {
+ .compr_type = UBIFS_COMPR_ZLIB,
+ .name = "zlib",
+};
+#endif
+
+/* All UBIFS compressors */
+struct ubifs_compressor *ubifs_compressors[UBIFS_COMPR_TYPES_CNT];
+
+/**
+ * ubifs_compress - compress data.
+ * @in_buf: data to compress
+ * @in_len: length of the data to compress
+ * @out_buf: output buffer where compressed data should be stored
+ * @out_len: output buffer length is returned here
+ * @compr_type: type of compression to use on enter, actually used compression
+ * type on exit
+ *
+ * This function compresses input buffer @in_buf of length @in_len and stores
+ * the result in the output buffer @out_buf and the resulting length in
+ * @out_len. If the input buffer does not compress, it is just copied to the
+ * @out_buf. The same happens if @compr_type is %UBIFS_COMPR_NONE or if
+ * compression error occurred.
+ *
+ * Note, if the input buffer was not compressed, it is copied to the output
+ * buffer and %UBIFS_COMPR_NONE is returned in @compr_type.
+ *
+ * This functions returns %0 on success or a negative error code on failure.
+ */
+void ubifs_compress(const void *in_buf, int in_len, void *out_buf, int *out_len,
+ int *compr_type)
+{
+ int err;
+ struct ubifs_compressor *compr = ubifs_compressors[*compr_type];
+
+ if (*compr_type == UBIFS_COMPR_NONE)
+ goto no_compr;
+
+ /* If the input data is small, do not even try to compress it */
+ if (in_len < MIN_COMPR_LEN)
+ goto no_compr;
+
+ ubifs_assert(compr->capi_name);
+ ubifs_assert(in_len > 0);
+
+ if (compr->comp_mutex)
+ mutex_lock(compr->comp_mutex);
+ err = crypto_comp_compress(compr->cc, in_buf, in_len, out_buf,
+ out_len);
+ if (compr->comp_mutex)
+ mutex_unlock(compr->comp_mutex);
+ if (unlikely(err)) {
+ ubifs_warn("cannot compress %d bytes, compressor %s, "
+ "error %d, leave data uncompressed",
+ in_len, compr->name, err);
+ goto no_compr;
+ }
+
+ ubifs_assert(*out_len > 0);
+
+ /*
+ * Presently, we just require that compression results in less data,
+ * rather than any defined minimum compression ratio or amount.
+ */
+ if (ALIGN(*out_len, 8) >= ALIGN(in_len, 8))
+ goto no_compr;
+
+ return;
+
+no_compr:
+ memcpy(out_buf, in_buf, in_len);
+ *out_len = in_len;
+ *compr_type = UBIFS_COMPR_NONE;
+}
+
+/**
+ * ubifs_decompress - decompress data.
+ * @in_buf: data to decompress
+ * @in_len: length of the data to decompress
+ * @out_buf: output buffer where decompressed data should
+ * @out_len: output length is returned here
+ * @compr_type: type of compression
+ *
+ * This function decompresses data from buffer @in_buf into buffer @out_buf.
+ * The length of the uncompressed data is returned in @out_len. This functions
+ * returns %0 on success or a negative error code on failure.
+ */
+int ubifs_decompress(const void *in_buf, int in_len, void *out_buf,
+ int *out_len, int compr_type)
+{
+ int err;
+ struct ubifs_compressor *compr;
+
+ if (unlikely(compr_type < 0 || compr_type >= UBIFS_COMPR_TYPES_CNT)) {
+ ubifs_err("invalid compression type %d", compr_type);
+ return -EINVAL;
+ }
+
+ compr = ubifs_compressors[compr_type];
+
+ if (unlikely(!compr->capi_name)) {
+ ubifs_err("%s compression is not compiled in", compr->name);
+ return -EINVAL;
+ }
+
+ if (compr_type == UBIFS_COMPR_NONE) {
+ memcpy(out_buf, in_buf, in_len);
+ *out_len = in_len;
+ return 0;
+ }
+
+ if (compr->decomp_mutex)
+ mutex_lock(compr->decomp_mutex);
+ err = crypto_comp_decompress(compr->cc, in_buf, in_len, out_buf,
+ out_len);
+ if (compr->decomp_mutex)
+ mutex_unlock(compr->decomp_mutex);
+ if (err)
+ ubifs_err("cannot decompress %d bytes, compressor %s, "
+ "error %d", in_len, compr->name, err);
+
+ return err;
+}
+
+/**
+ * compr_init - initialize a compressor.
+ * @compr: compressor description object
+ *
+ * This function initializes the requested compressor and returns zero in case
+ * of success or a negative error code in case of failure.
+ */
+static int __init compr_init(struct ubifs_compressor *compr)
+{
+ if (compr->capi_name) {
+ compr->cc = crypto_alloc_comp(compr->capi_name, 0, 0);
+ if (IS_ERR(compr->cc)) {
+ ubifs_err("cannot initialize compressor %s, error %ld",
+ compr->name, PTR_ERR(compr->cc));
+ return PTR_ERR(compr->cc);
+ }
+ }
+
+ ubifs_compressors[compr->compr_type] = compr;
+ return 0;
+}
+
+/**
+ * compr_exit - de-initialize a compressor.
+ * @compr: compressor description object
+ */
+static void compr_exit(struct ubifs_compressor *compr)
+{
+ if (compr->capi_name)
+ crypto_free_comp(compr->cc);
+ return;
+}
+
+/**
+ * ubifs_compressors_init - initialize UBIFS compressors.
+ *
+ * This function initializes the compressor which were compiled in. Returns
+ * zero in case of success and a negative error code in case of failure.
+ */
+int __init ubifs_compressors_init(void)
+{
+ int err;
+
+ err = compr_init(&lzo_compr);
+ if (err)
+ return err;
+
+ err = compr_init(&zlib_compr);
+ if (err)
+ goto out_lzo;
+
+ ubifs_compressors[UBIFS_COMPR_NONE] = &none_compr;
+ return 0;
+
+out_lzo:
+ compr_exit(&lzo_compr);
+ return err;
+}
+
+/**
+ * ubifs_compressors_exit - de-initialize UBIFS compressors.
+ */
+void __exit ubifs_compressors_exit(void)
+{
+ compr_exit(&lzo_compr);
+ compr_exit(&zlib_compr);
+}
--
1.5.4.1

2008-03-27 13:13:13

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 07/26] UBIFS: add file-system recovery

The recovery sub-system is responsible for recovering from unclean
reboots. It makes sure every-thing is consistent, rolls-back the
last broken and un-finished FS operation, and so on.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/recovery.c | 1437 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 1437 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c
new file mode 100644
index 0000000..e1e8916
--- /dev/null
+++ b/fs/ubifs/recovery.c
@@ -0,0 +1,1437 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements functions needed to recover from unclean un-mounts.
+ * When UBIFS is mounted, it checks a flag on the master node to determine if
+ * an un-mount was completed sucessfully. If not, the process of mounting
+ * incorparates additional checking and fixing of on-flash data structures.
+ * UBIFS always cleans away all remnants of an unclean un-mount, so that
+ * errors do not accumulate. However UBIFS defers recovery if it is mounted
+ * read-only, and the flash is not modified in that case.
+ */
+
+#include <linux/crc32.h>
+#include "ubifs.h"
+
+/**
+ * is_empty - determine whether a buffer is empty (contains all 0xff).
+ * @buf: buffer to clean
+ * @len: length of buffer
+ *
+ * This function returns %1 if the buffer is empty (contains all 0xff) otherwise
+ * %0 is returned.
+ */
+static int is_empty(void *buf, int len)
+{
+ uint8_t *p = buf;
+ int i;
+
+ for (i = 0; i < len; i++)
+ if (*p++ != 0xff)
+ return 0;
+ return 1;
+}
+
+/**
+ * get_master_node - get the last valid master node allowing for corruption.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number
+ * @pbuf: buffer containing the LEB read, is returned here
+ * @mst: master node, if found, is returned here
+ * @cor: corruption, if found, is returned here
+ *
+ * This function allocates a buffer, reads the LEB into it, and finds and
+ * returns the last valid master node allowing for one area of corruption.
+ * The corrupt area, if there is one, must be consistent with the assumption
+ * that it is the result of an unclean unmount while the master node was being
+ * written. Under those circumstances, it is valid to use the previously written
+ * master node.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int get_master_node(const struct ubifs_info *c, int lnum, void **pbuf,
+ struct ubifs_mst_node **mst, void **cor)
+{
+ const int sz = c->mst_node_alsz;
+ int err, offs, len;
+ void *sbuf, *buf;
+
+ sbuf = vmalloc(c->leb_size);
+ if (!sbuf)
+ return -ENOMEM;
+
+ err = ubi_read(c->ubi, lnum, sbuf, 0, c->leb_size);
+ if (err && err != -EBADMSG)
+ goto out_free;
+
+ /* Find the first position that is definitely not a node */
+ offs = 0;
+ buf = sbuf;
+ len = c->leb_size;
+ while (offs + UBIFS_MST_NODE_SZ <= c->leb_size) {
+ struct ubifs_ch *ch = buf;
+
+ if (le32_to_cpu(ch->magic) != UBIFS_NODE_MAGIC)
+ break;
+ offs += sz;
+ buf += sz;
+ len -= sz;
+ }
+ /* See if there was a valid master node before that */
+ if (offs) {
+ int ret;
+
+ offs -= sz;
+ buf -= sz;
+ len += sz;
+ ret = ubifs_scan_a_node(c, buf, len, lnum, offs, 1);
+ if (ret != SCANNED_A_NODE && offs) {
+ /* Could have been corruption so check one place back */
+ offs -= sz;
+ buf -= sz;
+ len += sz;
+ ret = ubifs_scan_a_node(c, buf, len, lnum, offs, 1);
+ if (ret != SCANNED_A_NODE)
+ /*
+ * We accept only one area of corruption because
+ * we are assuming that it was caused while
+ * trying to write a master node.
+ */
+ goto out_err;
+ }
+ if (ret == SCANNED_A_NODE) {
+ struct ubifs_ch *ch = buf;
+
+ if (ch->node_type != UBIFS_MST_NODE)
+ goto out_err;
+ dbg_mnt("found a master node at %d:%d", lnum, offs);
+ *mst = buf;
+ offs += sz;
+ buf += sz;
+ len -= sz;
+ }
+ }
+ /* Check for corruption */
+ if (offs < c->leb_size) {
+ if (!is_empty(buf, min_t(int, len, sz))) {
+ *cor = buf;
+ dbg_mnt("found corruption at %d:%d", lnum, offs);
+ }
+ offs += sz;
+ buf += sz;
+ len -= sz;
+ }
+ /* Check remaining empty space */
+ if (offs < c->leb_size)
+ if (!is_empty(buf, len))
+ goto out_err;
+ *pbuf = sbuf;
+ return 0;
+
+out_err:
+ err = -EINVAL;
+out_free:
+ vfree(sbuf);
+ return err;
+}
+
+/**
+ * write_rcvrd_mst_node - write recovered master node.
+ * @c: UBIFS file-system description object
+ * @mst: master node
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int write_rcvrd_mst_node(struct ubifs_info *c,
+ struct ubifs_mst_node *mst)
+{
+ int err = 0, lnum = UBIFS_MST_LNUM, sz = c->mst_node_alsz;
+ uint32_t save_flags;
+
+ dbg_mnt("recovery");
+
+ save_flags = mst->flags;
+ mst->flags = cpu_to_le32(le32_to_cpu(mst->flags) | UBIFS_MST_RCVRY);
+
+ ubifs_prepare_node(c, mst, UBIFS_MST_NODE_SZ, 1);
+ err = ubi_leb_change(c->ubi, lnum, mst, sz, UBI_SHORTTERM);
+ if (err)
+ goto out;
+ err = ubi_leb_change(c->ubi, lnum + 1, mst, sz, UBI_SHORTTERM);
+ if (err)
+ goto out;
+out:
+ mst->flags = save_flags;
+ return err;
+}
+
+/**
+ * ubifs_recover_master_node - recover the master node.
+ * @c: UBIFS file-system description object
+ *
+ * This function recovers the master node from corruption that may occur due to
+ * an unclean unmount.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_recover_master_node(struct ubifs_info *c)
+{
+ void *buf1 = NULL, *buf2 = NULL, *cor1 = NULL, *cor2 = NULL;
+ struct ubifs_mst_node *mst1 = NULL, *mst2 = NULL, *mst;
+ const int sz = c->mst_node_alsz;
+ int err, offs1, offs2;
+
+ dbg_mnt("recovery");
+
+ err = get_master_node(c, UBIFS_MST_LNUM, &buf1, &mst1, &cor1);
+ if (err)
+ goto out_free;
+
+ err = get_master_node(c, UBIFS_MST_LNUM + 1, &buf2, &mst2, &cor2);
+ if (err)
+ goto out_free;
+
+ if (mst1) {
+ offs1 = (void *)mst1 - buf1;
+ if ((le32_to_cpu(mst1->flags) & UBIFS_MST_RCVRY) &&
+ (offs1 == 0 && !cor1)) {
+ /*
+ * mst1 was written by recovery at offset 0 with no
+ * corruption.
+ */
+ dbg_mnt("recovery recovery");
+ mst = mst1;
+ } else if (mst2) {
+ offs2 = (void *)mst2 - buf2;
+ if (offs1 == offs2) {
+ /* Same offset, so must be the same */
+ if (memcmp((void *)mst1 + UBIFS_CH_SZ,
+ (void *)mst2 + UBIFS_CH_SZ,
+ UBIFS_MST_NODE_SZ - UBIFS_CH_SZ))
+ goto out_err;
+ mst = mst1;
+ } else if (offs2 + sz == offs1) {
+ /* 1st LEB was written, 2nd was not */
+ if (cor1)
+ goto out_err;
+ mst = mst1;
+ } else if (offs1 == 0 && offs2 + sz >= c->leb_size) {
+ /* 1st LEB was unmapped and written, 2nd not */
+ if (cor1)
+ goto out_err;
+ mst = mst1;
+ } else
+ goto out_err;
+ } else {
+ /*
+ * 2nd LEB was unmapped and about to be written, so
+ * there must be only one master node in the first LEB
+ * and no corruption.
+ */
+ if (offs1 != 0 || cor1)
+ goto out_err;
+ mst = mst1;
+ }
+ } else {
+ if (!mst2)
+ goto out_err;
+ /*
+ * 1st LEB was unmapped and about to be written, so there must
+ * be no room left in 2nd LEB.
+ */
+ offs2 = (void *)mst2 - buf2;
+ if (offs2 + sz + sz <= c->leb_size)
+ goto out_err;
+ mst = mst2;
+ }
+
+ dbg_mnt("recovered master node from LEB %d",
+ (mst == mst1 ? UBIFS_MST_LNUM : UBIFS_MST_LNUM + 1));
+
+ memcpy(c->mst_node, mst, UBIFS_MST_NODE_SZ);
+
+ if ((c->vfs_sb->s_flags & MS_RDONLY)) {
+ /* Read-only mode. Keep a copy for switching to rw mode */
+ c->rcvrd_mst_node = kmalloc(sz, GFP_KERNEL);
+ if (!c->rcvrd_mst_node) {
+ err = -ENOMEM;
+ goto out_free;
+ }
+ memcpy(c->rcvrd_mst_node, c->mst_node, UBIFS_MST_NODE_SZ);
+ } else {
+ /* Write the recovered master node */
+ c->max_sqnum = le64_to_cpu(mst->ch.sqnum) - 1;
+ err = write_rcvrd_mst_node(c, c->mst_node);
+ if (err)
+ goto out_free;
+ }
+
+ vfree(buf2);
+ vfree(buf1);
+
+ return 0;
+
+out_err:
+ err = -EINVAL;
+out_free:
+ ubifs_err("failed to recover master node");
+ if (mst1) {
+ dbg_err("dumping first master node");
+ dbg_dump_node(c, mst1);
+ }
+ if (mst2) {
+ dbg_err("dumping second master node");
+ dbg_dump_node(c, mst2);
+ }
+ vfree(buf2);
+ vfree(buf1);
+ return err;
+}
+
+/**
+ * ubifs_write_rcvrd_mst_node - write the recovered master node.
+ * @c: UBIFS file-system description object
+ *
+ * This function writes the master node that was recovered during mounting in
+ * read-only mode and must now be written because we are remounting rw.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_write_rcvrd_mst_node(struct ubifs_info *c)
+{
+ int err;
+
+ if (!c->rcvrd_mst_node)
+ return 0;
+ c->rcvrd_mst_node->flags |= cpu_to_le32(UBIFS_MST_DIRTY);
+ c->mst_node->flags |= cpu_to_le32(UBIFS_MST_DIRTY);
+ err = write_rcvrd_mst_node(c, c->rcvrd_mst_node);
+ if (err)
+ return err;
+ kfree(c->rcvrd_mst_node);
+ c->rcvrd_mst_node = NULL;
+ return 0;
+}
+
+/**
+ * is_last_write - determine if an offset was in the last write to a LEB.
+ * @c: UBIFS file-system description object
+ * @buf: buffer to check
+ * @offs: offset to check
+ *
+ * This function returns %1 if @offs was in the last write to the LEB whose data
+ * is in @buf, otherwise %0 is returned. The determination is made by checking
+ * for subsequent empty space starting from the next min_io_size boundary (or a
+ * bit less than the common header size if min_io_size is one).
+ */
+static int is_last_write(const struct ubifs_info *c, void *buf, int offs)
+{
+ int empty_offs;
+ int check_len;
+ uint8_t *p;
+
+ if (c->min_io_size == 1) {
+ check_len = c->leb_size - offs;
+ p = buf + check_len;
+ for (; check_len > 0; check_len--)
+ if (*--p != 0xff)
+ break;
+ /*
+ * 'check_len' is the size of the corruption which cannot be
+ * more than the size of 1 node if it was caused by an unclean
+ * unmount.
+ */
+ if (check_len > UBIFS_MAX_NODE_SZ)
+ return 0;
+ return 1;
+ }
+
+ /*
+ * Round up to the next c->min_io_size boundary i.e. 'offs' is in the
+ * last wbuf written. After that should be empty space.
+ */
+ empty_offs = ALIGN(offs + 1, c->min_io_size);
+ check_len = c->leb_size - empty_offs;
+ p = buf + empty_offs - offs;
+
+ for (; check_len > 0; check_len--)
+ if (*p++ != 0xff)
+ return 0;
+ return 1;
+}
+
+/**
+ * clean_buf - clean the data from an LEB sitting in a buffer.
+ * @c: UBIFS file-system description object
+ * @buf: buffer to clean
+ * @lnum: LEB number to clean
+ * @offs: offset from which to clean
+ * @len: length of buffer
+ *
+ * This function pads up to the next min_io_size boundary (if there is one) and
+ * sets empty space to all 0xff. @buf, @offs and @len are updated to the next
+ * min_io_size boundary (if there is one).
+ */
+static void clean_buf(const struct ubifs_info *c, void **buf, int lnum,
+ int *offs, int *len)
+{
+ int empty_offs, pad_len;
+
+ lnum = lnum;
+ dbg_mnt("cleaning corruption at %d:%d", lnum, *offs);
+
+ if (c->min_io_size == 1) {
+ memset(*buf, 0xff, c->leb_size - *offs);
+ return;
+ }
+
+ ubifs_assert(!(*offs & 7));
+
+ empty_offs = ALIGN(*offs, c->min_io_size);
+ pad_len = empty_offs - *offs;
+ ubifs_pad(c, *buf, pad_len);
+ *offs += pad_len;
+ *buf += pad_len;
+ *len -= pad_len;
+ memset(*buf, 0xff, c->leb_size - empty_offs);
+}
+
+/**
+ * no_more_nodes - determine if there are no more nodes in a buffer.
+ * @c: UBIFS file-system description object
+ * @buf: buffer to check
+ * @len: length of buffer
+ * @lnum: LEB number of the LEB from which @buf was read
+ * @offs: offset from which @buf was read
+ *
+ * This function scans @buf for more nodes and returns %0 is a node is found and
+ * %1 if no more nodes are found.
+ */
+static int no_more_nodes(const struct ubifs_info *c, void *buf, int len,
+ int lnum, int offs)
+{
+ int skip, next_offs = 0;
+
+ if (len > UBIFS_DATA_NODE_SZ) {
+ struct ubifs_ch *ch = buf;
+ int dlen = le32_to_cpu(ch->len);
+
+ if (ch->node_type == UBIFS_DATA_NODE && dlen >= UBIFS_CH_SZ &&
+ dlen <= UBIFS_MAX_DATA_NODE_SZ)
+ /* The corrupt node looks like a data node */
+ next_offs = ALIGN(offs + dlen, 8);
+ }
+
+ if (c->min_io_size == 1)
+ skip = 8;
+ else
+ skip = ALIGN(offs + 1, c->min_io_size) - offs;
+
+ offs += skip;
+ buf += skip;
+ len -= skip;
+ while (len > 8) {
+ struct ubifs_ch *ch = buf;
+ uint32_t magic = le32_to_cpu(ch->magic);
+ int ret;
+
+ if (magic == UBIFS_NODE_MAGIC) {
+ ret = ubifs_scan_a_node(c, buf, len, lnum, offs, 1);
+ if (ret == SCANNED_A_NODE || ret > 0) {
+ /*
+ * There is a small chance this is just data in
+ * a data node, so check that possibility. e.g.
+ * this is part of a file that itself contains
+ * a UBIFS image.
+ */
+ if (next_offs && offs + le32_to_cpu(ch->len) <=
+ next_offs)
+ continue;
+ dbg_mnt("unexpected node at %d:%d", lnum, offs);
+ return 0;
+ }
+ }
+ offs += 8;
+ buf += 8;
+ len -= 8;
+ }
+ return 1;
+}
+
+/**
+ * fix_unclean_leb - fix an unclean LEB.
+ * @c: UBIFS file-system description object
+ * @sleb: scanned LEB information
+ * @start: offset where scan started
+ */
+static int fix_unclean_leb(struct ubifs_info *c, struct ubifs_scan_leb *sleb,
+ int start)
+{
+ int lnum = sleb->lnum, endpt = start;
+
+ /* Get the end offset of the last node we are keeping */
+ if (!list_empty(&sleb->nodes)) {
+ struct ubifs_scan_node *snod;
+
+ snod = list_entry(sleb->nodes.prev,
+ struct ubifs_scan_node, list);
+ endpt = snod->offs + snod->len;
+ }
+
+ if ((c->vfs_sb->s_flags & MS_RDONLY) && !c->remounting_rw) {
+ /* Add to recovery list */
+ struct ubifs_unclean_leb *ucleb;
+
+ dbg_mnt("need to fix LEB %d start %d endpt %d",
+ lnum, start, sleb->endpt);
+ ucleb = kzalloc(sizeof(struct ubifs_unclean_leb), GFP_NOFS);
+ if (!ucleb)
+ return -ENOMEM;
+ ucleb->lnum = lnum;
+ ucleb->endpt = endpt;
+ list_add_tail(&ucleb->list, &c->unclean_leb_list);
+ } else {
+ /* Write the fixed LEB back to flash */
+ int err;
+
+ dbg_mnt("fixing LEB %d start %d endpt %d",
+ lnum, start, sleb->endpt);
+ if (endpt == 0) {
+ err = ubifs_leb_unmap(c, lnum);
+ if (err)
+ return err;
+ } else {
+ int len = ALIGN(endpt, c->min_io_size);
+
+ if (start) {
+ err = ubi_read(c->ubi, lnum, sleb->buf, 0,
+ start);
+ if (err)
+ return err;
+ }
+ /* Pad to min_io_size */
+ if (len > endpt) {
+ int pad_len = len - ALIGN(endpt, 8);
+
+ if (pad_len > 0) {
+ void *buf = sleb->buf + len - pad_len;
+
+ ubifs_pad(c, buf, pad_len);
+ }
+ }
+ err = ubi_leb_change(c->ubi, lnum, sleb->buf, len,
+ UBI_UNKNOWN);
+ if (err)
+ return err;
+ }
+ }
+ return 0;
+}
+
+/**
+ * drop_incomplete_group - drop nodes from an incomplete group.
+ * @sleb: scanned LEB information
+ * @offs: offset of dropped nodes is returned here
+ *
+ * This function returns %1 if nodes are dropped and %0 otherwise.
+ */
+static int drop_incomplete_group(struct ubifs_scan_leb *sleb, int *offs)
+{
+ int dropped = 0;
+
+ while (!list_empty(&sleb->nodes)) {
+ struct ubifs_scan_node *snod;
+ struct ubifs_ch *ch;
+
+ snod = list_entry(sleb->nodes.prev, struct ubifs_scan_node,
+ list);
+ ch = snod->node;
+ if (ch->group_type != UBIFS_IN_NODE_GROUP)
+ return dropped;
+ dbg_mnt("dropping node at %d:%d", sleb->lnum, snod->offs);
+ *offs = snod->offs;
+ list_del(&snod->list);
+ kfree(snod);
+ sleb->nodes_cnt -= 1;
+ dropped = 1;
+ }
+ return dropped;
+}
+
+/**
+ * ubifs_recover_leb - scan and recover a LEB.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number
+ * @offs: offset
+ * @sbuf: LEB-sized buffer to use
+ * @grouped: nodes may be grouped for recovery
+ *
+ * This function does a scan of a LEB, but caters for errors that might have
+ * been caused by the unclean unmount from which we are attempting to recover.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+struct ubifs_scan_leb *ubifs_recover_leb(struct ubifs_info *c, int lnum,
+ int offs, void *sbuf, int grouped)
+{
+ int err, len = c->leb_size - offs, need_clean = 0, quiet = 1;
+ int empty_chkd = 0, start = offs;
+ struct ubifs_scan_leb *sleb;
+ void *buf = sbuf + offs;
+
+ dbg_mnt("%d:%d", lnum, offs);
+
+ sleb = ubifs_start_scan(c, lnum, offs, sbuf);
+ if (IS_ERR(sleb))
+ return sleb;
+
+ if (sleb->ecc)
+ need_clean = 1;
+
+ while (len >= 8) {
+ int ret;
+
+ dbg_scan("look at LEB %d:%d (%d bytes left)",
+ lnum, offs, len);
+
+ cond_resched();
+
+ /*
+ * Scan quietly until there is an error from which we cannot
+ * recover
+ */
+ ret = ubifs_scan_a_node(c, buf, len, lnum, offs, quiet);
+
+ if (ret == SCANNED_A_NODE) {
+ /* A valid node, and not a padding node */
+ struct ubifs_ch *ch = buf;
+ int node_len;
+
+ err = ubifs_add_snod(c, sleb, buf, offs);
+ if (err)
+ goto error;
+ node_len = ALIGN(le32_to_cpu(ch->len), 8);
+ offs += node_len;
+ buf += node_len;
+ len -= node_len;
+ continue;
+ }
+
+ if (ret > 0) {
+ /* Padding bytes or a valid padding node */
+ offs += ret;
+ buf += ret;
+ len -= ret;
+ continue;
+ }
+
+ if (ret == SCANNED_EMPTY_SPACE) {
+ if (!is_empty(buf, len)) {
+ if (!is_last_write(c, buf, offs))
+ break;
+ clean_buf(c, &buf, lnum, &offs, &len);
+ need_clean = 1;
+ }
+ empty_chkd = 1;
+ break;
+ }
+
+ if (ret == SCANNED_GARBAGE || ret == SCANNED_A_BAD_PAD_NODE)
+ if (is_last_write(c, buf, offs)) {
+ clean_buf(c, &buf, lnum, &offs, &len);
+ need_clean = 1;
+ empty_chkd = 1;
+ break;
+ }
+
+ if (ret == SCANNED_A_CORRUPT_NODE)
+ if (no_more_nodes(c, buf, len, lnum, offs)) {
+ clean_buf(c, &buf, lnum, &offs, &len);
+ need_clean = 1;
+ empty_chkd = 1;
+ break;
+ }
+
+ if (quiet) {
+ /* Redo the last scan but noisily */
+ quiet = 0;
+ continue;
+ }
+
+ switch (ret) {
+ case SCANNED_GARBAGE:
+ dbg_err("garbage");
+ goto corrupted;
+ case SCANNED_A_CORRUPT_NODE:
+ case SCANNED_A_BAD_PAD_NODE:
+ dbg_err("bad node");
+ goto corrupted;
+ default:
+ dbg_err("unknown");
+ goto corrupted;
+ }
+ }
+
+ if (!empty_chkd && !is_empty(buf, len)) {
+ if (is_last_write(c, buf, offs)) {
+ clean_buf(c, &buf, lnum, &offs, &len);
+ need_clean = 1;
+ } else {
+ ubifs_err("corrupt empty space at LEB %d:%d",
+ lnum, offs);
+ goto corrupted;
+ }
+ }
+
+ /* Drop nodes from incomplete group */
+ if (grouped && drop_incomplete_group(sleb, &offs)) {
+ buf = sbuf + offs;
+ len = c->leb_size - offs;
+ clean_buf(c, &buf, lnum, &offs, &len);
+ need_clean = 1;
+ }
+
+ if (offs % c->min_io_size) {
+ clean_buf(c, &buf, lnum, &offs, &len);
+ need_clean = 1;
+ }
+
+ ubifs_end_scan(c, sleb, lnum, offs);
+
+ if (need_clean) {
+ err = fix_unclean_leb(c, sleb, start);
+ if (err)
+ goto error;
+ }
+
+ return sleb;
+
+corrupted:
+ ubifs_scanned_corruption(c, lnum, offs, buf);
+ err = -EUCLEAN;
+error:
+ ubifs_err("LEB %d scanning failed", lnum);
+ ubifs_scan_destroy(sleb);
+ return ERR_PTR(err);
+}
+
+/**
+ * get_cs_sqnum - get commit start sequence number.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number of commit start node
+ * @offs: offset of commit start node
+ * @cs_sqnum: commit start sequence number is returned here
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int get_cs_sqnum(struct ubifs_info *c, int lnum, int offs,
+ unsigned long long *cs_sqnum)
+{
+ struct ubifs_cs_node *cs_node = NULL;
+ int err, ret;
+
+ dbg_mnt("at %d:%d", lnum, offs);
+ cs_node = kmalloc(UBIFS_CS_NODE_SZ, GFP_KERNEL);
+ if (!cs_node)
+ return -ENOMEM;
+ if (c->leb_size - offs < UBIFS_CS_NODE_SZ)
+ goto out_err;
+ err = ubi_read(c->ubi, lnum, (void *)cs_node, offs, UBIFS_CS_NODE_SZ);
+ if (err && err != -EBADMSG)
+ goto out_free;
+ ret = ubifs_scan_a_node(c, cs_node, UBIFS_CS_NODE_SZ, lnum, offs, 0);
+ if (ret != SCANNED_A_NODE) {
+ dbg_err("Not a valid node");
+ goto out_err;
+ }
+ if (cs_node->ch.node_type != UBIFS_CS_NODE) {
+ dbg_err("Node a CS node, type is %d", cs_node->ch.node_type);
+ goto out_err;
+ }
+ if (le64_to_cpu(cs_node->cmt_no) != c->cmt_no) {
+ dbg_err("CS node cmt_no %llu != current cmt_no %llu",
+ le64_to_cpu(cs_node->cmt_no), c->cmt_no);
+ goto out_err;
+ }
+ *cs_sqnum = le64_to_cpu(cs_node->ch.sqnum);
+ dbg_mnt("commit start sqnum %llu", *cs_sqnum);
+ kfree(cs_node);
+ return 0;
+
+out_err:
+ err = -EINVAL;
+out_free:
+ ubifs_err("failed to get CS sqnum");
+ kfree(cs_node);
+ return err;
+}
+
+/**
+ * ubifs_recover_log_leb - scan and recover a log LEB.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number
+ * @offs: offset
+ * @sbuf: LEB-sized buffer to use
+ *
+ * This function does a scan of a LEB, but caters for errors that might have
+ * been caused by the unclean unmount from which we are attempting to recover.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+struct ubifs_scan_leb *ubifs_recover_log_leb(struct ubifs_info *c, int lnum,
+ int offs, void *sbuf)
+{
+ struct ubifs_scan_leb *sleb;
+ int next_lnum;
+
+ dbg_mnt("LEB %d", lnum);
+ next_lnum = lnum + 1;
+ if (next_lnum >= UBIFS_LOG_LNUM + c->log_lebs)
+ next_lnum = UBIFS_LOG_LNUM;
+ if (next_lnum != c->ltail_lnum) {
+ /*
+ * We can only recover at the end of the log, so check that the
+ * next log LEB is empty or out of date.
+ */
+ sleb = ubifs_scan(c, next_lnum, 0, sbuf);
+ if (IS_ERR(sleb))
+ return sleb;
+ if (sleb->nodes_cnt) {
+ struct ubifs_scan_node *snod;
+ unsigned long long cs_sqnum = c->cs_sqnum;
+
+ snod = list_entry(sleb->nodes.next,
+ struct ubifs_scan_node, list);
+ if (cs_sqnum == 0) {
+ int err;
+
+ err = get_cs_sqnum(c, lnum, offs, &cs_sqnum);
+ if (err) {
+ ubifs_scan_destroy(sleb);
+ return ERR_PTR(err);
+ }
+ }
+ if (snod->sqnum > cs_sqnum) {
+ ubifs_err("unrecoverable log corruption "
+ "in LEB %d", lnum);
+ ubifs_scan_destroy(sleb);
+ return ERR_PTR(-EUCLEAN);
+ }
+ }
+ ubifs_scan_destroy(sleb);
+ }
+ return ubifs_recover_leb(c, lnum, offs, sbuf, 0);
+}
+
+/**
+ * recover_head - recover a head.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number of head to recover
+ * @offs: offset of head to recover
+ * @sbuf: LEB-sized buffer to use
+ *
+ * This function ensures that there is no data on the flash at a head location.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int recover_head(const struct ubifs_info *c, int lnum, int offs,
+ void *sbuf)
+{
+ int len, err, need_clean = 0;
+
+ if (c->min_io_size > 1)
+ len = c->min_io_size;
+ else
+ len = 512;
+ if (offs + len > c->leb_size)
+ len = c->leb_size - offs;
+
+ if (!len)
+ return 0;
+
+ /* Read at the head location and check it is empty flash */
+ err = ubi_read(c->ubi, lnum, sbuf, offs, len);
+ if (err)
+ need_clean = 1;
+ else {
+ uint8_t *p = sbuf;
+
+ while (len--)
+ if (*p++ != 0xff) {
+ need_clean = 1;
+ break;
+ }
+ }
+
+ if (need_clean) {
+ dbg_mnt("cleaning head at %d:%d", lnum, offs);
+ if (offs == 0)
+ return ubifs_leb_unmap(c, lnum);
+ err = ubi_read(c->ubi, lnum, sbuf, 0, offs);
+ if (err)
+ return err;
+ return ubi_leb_change(c->ubi, lnum, sbuf, offs, UBI_UNKNOWN);
+ }
+
+ return 0;
+}
+
+/**
+ * ubifs_recover_inl_heads - recover index and LPT heads.
+ * @c: UBIFS file-system description object
+ * @sbuf: LEB-sized buffer to use
+ *
+ * This function ensures that there is no data on the flash at the index and
+ * LPT head locations.
+ *
+ * This deals with the recovery of a half-completed journal commit. UBIFS is
+ * careful never to overwrite the last version of the index or the LPT. Because
+ * the index and LPT are wandering trees, data from a half-completed commit will
+ * not be referenced anywhere in UBIFS. The data will be either in LEBs that are
+ * assumed to be empty and will be unmapped anyway before use, or in the index
+ * and LPT heads.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_recover_inl_heads(const struct ubifs_info *c, void *sbuf)
+{
+ int err;
+
+ ubifs_assert(!(c->vfs_sb->s_flags & MS_RDONLY) || c->remounting_rw);
+
+ dbg_mnt("checking index head at %d:%d", c->ihead_lnum, c->ihead_offs);
+ err = recover_head(c, c->ihead_lnum, c->ihead_offs, sbuf);
+ if (err)
+ return err;
+
+ dbg_mnt("checking LPT head at %d:%d", c->nhead_lnum, c->nhead_offs);
+ err = recover_head(c, c->nhead_lnum, c->nhead_offs, sbuf);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+/**
+ * clean_an_unclean_leb - read and write a LEB to remove corruption.
+ * @c: UBIFS file-system description object
+ * @ucleb: unclean LEB information
+ * @sbuf: LEB-sized buffer to use
+ *
+ * This function reads a LEB up to a point pre-determined by the mount recovery,
+ * checks the nodes, and writes the result back to the flash, thereby cleaning
+ * off any following corruption, or non-fatal ECC errors.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int clean_an_unclean_leb(const struct ubifs_info *c,
+ struct ubifs_unclean_leb *ucleb, void *sbuf)
+{
+ int err, lnum = ucleb->lnum, offs = 0, len = ucleb->endpt, quiet = 1;
+ void *buf = sbuf;
+
+ dbg_mnt("LEB %d len %d", lnum, len);
+
+ if (len == 0) {
+ /* Nothing to read, just unmap it */
+ err = ubifs_leb_unmap(c, lnum);
+ if (err)
+ return err;
+ return 0;
+ }
+
+ err = ubi_read(c->ubi, lnum, buf, offs, len);
+ if (err && err != -EBADMSG)
+ return err;
+
+ while (len >= 8) {
+ int ret;
+
+ cond_resched();
+
+ /* Scan quietly until there is an error */
+ ret = ubifs_scan_a_node(c, buf, len, lnum, offs, quiet);
+
+ if (ret == SCANNED_A_NODE) {
+ /* A valid node, and not a padding node */
+ struct ubifs_ch *ch = buf;
+ int node_len;
+
+ node_len = ALIGN(le32_to_cpu(ch->len), 8);
+ offs += node_len;
+ buf += node_len;
+ len -= node_len;
+ continue;
+ }
+
+ if (ret > 0) {
+ /* Padding bytes or a valid padding node */
+ offs += ret;
+ buf += ret;
+ len -= ret;
+ continue;
+ }
+
+ if (ret == SCANNED_EMPTY_SPACE) {
+ ubifs_err("unexpected empty space at %d:%d",
+ lnum, offs);
+ return -EUCLEAN;
+ }
+
+ if (quiet) {
+ /* Redo the last scan but noisily */
+ quiet = 0;
+ continue;
+ }
+
+ ubifs_scanned_corruption(c, lnum, offs, buf);
+ return -EUCLEAN;
+ }
+
+ /* Pad to min_io_size */
+ len = ALIGN(ucleb->endpt, c->min_io_size);
+ if (len > ucleb->endpt) {
+ int pad_len = len - ALIGN(ucleb->endpt, 8);
+
+ if (pad_len > 0) {
+ buf = c->sbuf + len - pad_len;
+ ubifs_pad(c, buf, pad_len);
+ }
+ }
+
+ /* Write back the LEB atomically */
+ err = ubi_leb_change(c->ubi, lnum, sbuf, len, UBI_UNKNOWN);
+ if (err)
+ return err;
+
+ dbg_mnt("cleaned LEB %d", lnum);
+
+ return 0;
+}
+
+/**
+ * ubifs_clean_lebs - clean LEBs recovered during read-only mount.
+ * @c: UBIFS file-system description object
+ * @sbuf: LEB-sized buffer to use
+ *
+ * This function cleans a LEB identified during recovery that needs to be
+ * written but was not because UBIFS was mounted read-only. This happens when
+ * remounting to read-write mode.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_clean_lebs(const struct ubifs_info *c, void *sbuf)
+{
+ dbg_mnt("recovery");
+ while (!list_empty(&c->unclean_leb_list)) {
+ struct ubifs_unclean_leb *ucleb;
+ int err;
+
+ ucleb = list_entry(c->unclean_leb_list.next,
+ struct ubifs_unclean_leb, list);
+ err = clean_an_unclean_leb(c, ucleb, sbuf);
+ if (err)
+ return err;
+ list_del(&ucleb->list);
+ kfree(ucleb);
+ }
+ return 0;
+}
+
+/**
+ * ubifs_recover_gc_lnum - recover the GC LEB number.
+ * @c: UBIFS file-system description object
+ *
+ * Out-of-place garbage collection requires always one empty LEB with which to
+ * start garbage collection. The LEB number is recorded in c->gc_lnum and is
+ * written to the master node on unmounting. In the case of an unclean unmount
+ * the value of gc_lnum recorded in the master node is out of date and cannot
+ * be used. Instead, recovery must allocate an empty LEB for this purpose.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_recover_gc_lnum(struct ubifs_info *c)
+{
+ int lnum, err;
+
+ c->gc_lnum = -1;
+ /* Call 'ubifs_find_free_leb_for_idx()' so GC is not run */
+ lnum = ubifs_find_free_leb_for_idx(c);
+ if (lnum < 0)
+ return lnum;
+ /* And reset the index flag */
+ err = ubifs_change_one_lp(c, lnum, -1, -1, 0, LPROPS_INDEX, 0);
+ if (err)
+ return err;
+ c->gc_lnum = lnum;
+ dbg_mnt("allocated LEB %d for GC", lnum);
+ return 0;
+}
+
+/**
+ * struct size_entry - inode size information for recovery.
+ * @rb: link in the RB-tree of sizes
+ * @inum: inode number
+ * @i_size: size on inode
+ * @d_size: maximum size based on data nodes
+ * @exists: indicates whether the inode exists
+ * @inode: inode if pinned in memory awaiting rw mode to fix it
+ */
+struct size_entry {
+ struct rb_node rb;
+ ino_t inum;
+ loff_t i_size;
+ loff_t d_size;
+ int exists;
+ struct inode *inode;
+};
+
+/**
+ * add_ino - add an entry to the size tree.
+ * @c: UBIFS file-system description object
+ * @inum: inode number
+ * @i_size: size on inode
+ * @d_size: maximum size based on data nodes
+ * @exists: indicates whether the inode exists
+ */
+static int add_ino(struct ubifs_info *c, ino_t inum, loff_t i_size,
+ loff_t d_size, int exists)
+{
+ struct rb_node **p = &c->size_tree.rb_node, *parent = NULL;
+ struct size_entry *e;
+
+ while (*p) {
+ parent = *p;
+ e = rb_entry(parent, struct size_entry, rb);
+ if (inum < e->inum)
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
+ }
+
+ e = kzalloc(sizeof(struct size_entry), GFP_KERNEL);
+ if (!e)
+ return -ENOMEM;
+
+ e->inum = inum;
+ e->i_size = i_size;
+ e->d_size = d_size;
+ e->exists = exists;
+
+ rb_link_node(&e->rb, parent, p);
+ rb_insert_color(&e->rb, &c->size_tree);
+
+ return 0;
+}
+
+/**
+ * find_ino - find an entry on the size tree.
+ * @c: UBIFS file-system description object
+ * @inum: inode number
+ */
+static struct size_entry *find_ino(struct ubifs_info *c, ino_t inum)
+{
+ struct rb_node *p = c->size_tree.rb_node;
+ struct size_entry *e;
+
+ while (p) {
+ e = rb_entry(p, struct size_entry, rb);
+ if (inum < e->inum)
+ p = p->rb_left;
+ else if (inum > e->inum)
+ p = p->rb_right;
+ else
+ return e;
+ }
+ return NULL;
+}
+
+/**
+ * remove_ino - remove an entry from the size tree.
+ * @c: UBIFS file-system description object
+ * @inum: inode number
+ */
+static void remove_ino(struct ubifs_info *c, ino_t inum)
+{
+ struct size_entry *e = find_ino(c, inum);
+
+ if (!e)
+ return;
+ rb_erase(&e->rb, &c->size_tree);
+ kfree(e);
+}
+
+/**
+ * ubifs_destroy_size_tree - free resources related to the size tree.
+ * @c: UBIFS file-system description object
+ */
+void ubifs_destroy_size_tree(struct ubifs_info *c)
+{
+ struct rb_node *this = c->size_tree.rb_node;
+ struct size_entry *e;
+
+ while (this) {
+ if (this->rb_left) {
+ this = this->rb_left;
+ continue;
+ } else if (this->rb_right) {
+ this = this->rb_right;
+ continue;
+ }
+ e = rb_entry(this, struct size_entry, rb);
+ if (e->inode)
+ iput(e->inode);
+ this = rb_parent(this);
+ if (this) {
+ if (this->rb_left == &e->rb)
+ this->rb_left = NULL;
+ else
+ this->rb_right = NULL;
+ }
+ kfree(e);
+ }
+ c->size_tree = RB_ROOT;
+}
+
+/**
+ * ubifs_recover_size_accum - accumulate inode sizes for recovery.
+ * @c: UBIFS file-system description object
+ * @key: node key
+ * @deletion: node is for a deletion
+ * @new_size: inode size
+ *
+ * This function has two purposes:
+ * 1) to ensure there are no data nodes that fall outside the inode size
+ * 2) to ensure there are no data nodes for inodes that do not exist
+ * To accomplish those purposes, a rb-tree is constructed containing an entry
+ * for each inode number in the journal that has not been deleted, and recording
+ * the size from the inode node, the maximum size of any data node (also altered
+ * by truncations) and a flag indicating a inode number for which no inode node
+ * was present in the journal.
+ *
+ * Note that there is still the possibility that there are data nodes that have
+ * been committed that are beyond the inode size, however the only way to find
+ * them would be to scan the entire index. Alternatively, some provision could
+ * be made to record the size of inodes at the start of commit, which would seem
+ * very cumbersome for a scenario that is quite unlikely and the only negative
+ * consequence of which is wasted space.
+ *
+ * This functions returns %0 on success and a negative error code on failure.
+ */
+int ubifs_recover_size_accum(struct ubifs_info *c, union ubifs_key *key,
+ int deletion, loff_t new_size)
+{
+ ino_t inum = key_ino(c, key);
+ struct size_entry *e;
+ int err;
+
+ switch (key_type(c, key)) {
+ case UBIFS_INO_KEY:
+ if (deletion)
+ remove_ino(c, inum);
+ else {
+ e = find_ino(c, inum);
+ if (e) {
+ e->i_size = new_size;
+ e->exists = 1;
+ } else {
+ err = add_ino(c, inum, new_size, 0, 1);
+ if (err)
+ return err;
+ }
+ }
+ break;
+ case UBIFS_DATA_KEY:
+ e = find_ino(c, inum);
+ if (e) {
+ if (new_size > e->d_size)
+ e->d_size = new_size;
+ } else {
+ err = add_ino(c, inum, 0, new_size, 0);
+ if (err)
+ return err;
+ }
+ break;
+ case UBIFS_TRUN_KEY:
+ e = find_ino(c, inum);
+ if (e)
+ e->d_size = new_size;
+ break;
+ }
+ return 0;
+}
+
+/**
+ * fix_size_in_place - fix inode size in place on flash.
+ * @c: UBIFS file-system description object
+ * @e: inode size information for recovery
+ */
+static int fix_size_in_place(struct ubifs_info *c, struct size_entry *e)
+{
+ struct ubifs_ino_node *ino = c->sbuf;
+ unsigned char *p;
+ union ubifs_key key;
+ int err, lnum, offs, len;
+ loff_t i_size;
+ uint32_t crc;
+
+ /* Locate the inode node LEB number and offset */
+ ino_key_init(c, &key, e->inum);
+ err = ubifs_tnc_locate(c, &key, ino, &lnum, &offs);
+ if (err)
+ goto out;
+ /*
+ * If the size recorded on the inode node is greater than the size that
+ * was calculated from nodes in the journal then don't change the inode.
+ */
+ i_size = le64_to_cpu(ino->size);
+ if (i_size >= e->d_size)
+ return 0;
+ /* Read the LEB */
+ err = ubi_read(c->ubi, lnum, c->sbuf, 0, c->leb_size);
+ if (err)
+ goto out;
+ /* Change the size field and recalculate the CRC */
+ ino = c->sbuf + offs;
+ ino->size = cpu_to_le64(e->d_size);
+ len = le32_to_cpu(ino->ch.len);
+ crc = crc32(UBIFS_CRC32_INIT, (void *)ino + 8, len - 8);
+ ino->ch.crc = cpu_to_le32(crc);
+ /* Work out where data in the LEB ends and free space begins */
+ p = c->sbuf;
+ len = c->leb_size - 1;
+ while (p[len] == 0xff)
+ len -= 1;
+ len = ALIGN(len + 1, c->min_io_size);
+ /* Atomically write the fixed LEB back again */
+ err = ubi_leb_change(c->ubi, lnum, c->sbuf, len, UBI_UNKNOWN);
+ if (err)
+ goto out;
+ dbg_mnt("inode %lu at %d:%d size %lld -> %lld ", e->inum, lnum, offs,
+ i_size, e->d_size);
+ return 0;
+
+out:
+ ubifs_warn("inode %lu failed to fix size %lld -> %lld error %d",
+ e->inum, e->i_size, e->d_size, err);
+ return err;
+}
+
+/**
+ * ubifs_recover_size - recover inode size.
+ * @c: UBIFS file-system description object
+ *
+ * This function attempts to fix inode size discrepancies identified by the
+ * 'ubifs_recover_size_accum()' function.
+ *
+ * This functions returns %0 on success and a negative error code on failure.
+ */
+int ubifs_recover_size(struct ubifs_info *c)
+{
+ struct rb_node *this = rb_first(&c->size_tree);
+
+ while (this) {
+ struct size_entry *e;
+ int err;
+
+ e = rb_entry(this, struct size_entry, rb);
+ if (!e->exists) {
+ union ubifs_key key;
+
+ ino_key_init(c, &key, e->inum);
+ err = ubifs_tnc_lookup(c, &key, c->sbuf);
+ if (err && err != -ENOENT)
+ return err;
+ if (err == -ENOENT) {
+ /* Remove data nodes that have no inode */
+ dbg_mnt("removing ino %lu", e->inum);
+ err = ubifs_tnc_remove_ino(c, e->inum);
+ if (err)
+ return err;
+ /*
+ * If we later unmount cleanly without
+ * committing, the TNC changes will be lost,
+ * hence we set a flag to ensure a commit is
+ * done.
+ */
+ c->recovery_needs_commit = 1;
+ } else {
+ struct ubifs_ino_node *ino = c->sbuf;
+
+ e->exists = 1;
+ e->i_size = le64_to_cpu(ino->size);
+ }
+ }
+ if (e->exists && e->i_size < e->d_size) {
+ if (e->inode == NULL &&
+ (c->vfs_sb->s_flags & MS_RDONLY)) {
+ /* Fix the inode size and pin it in memory */
+ struct inode *inode;
+
+ inode = ubifs_iget(c->vfs_sb, e->inum);
+ if (IS_ERR(inode))
+ return PTR_ERR(inode);
+ if (inode->i_size < e->d_size) {
+ dbg_mnt("ino %lu size %lld -> %lld",
+ e->inum, e->d_size,
+ inode->i_size);
+ inode->i_size = e->d_size;
+ e->inode = inode;
+ this = rb_next(this);
+ continue;
+ }
+ iput(inode);
+ } else {
+ /* Fix the size in place */
+ err = fix_size_in_place(c, e);
+ if (err) {
+ if (e->inode)
+ /*
+ * We have changed the inode
+ * size in memory but failed to
+ * fix it on flash. Mark it
+ * dirty without budgeting, and
+ * hope we don't run out of
+ * space.
+ */
+ mark_inode_dirty_sync(e->inode);
+ /*
+ * We consider that failing to recover
+ * the size is not fatal, because it
+ * only affects files that were being
+ * written without synchronization and
+ * the only down side is that some space
+ * may be wasted.
+ */
+ err = 0;
+ }
+ if (e->inode)
+ iput(e->inode);
+ }
+ }
+ this = rb_next(this);
+ rb_erase(&e->rb, &c->size_tree);
+ kfree(e);
+ }
+ return 0;
+}
--
1.5.4.1

2008-03-27 13:13:32

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 15/26] UBIFS: add LEB properties

UBIFS keeps track of all logical eraseblock - how much data do they
contain, how much of these data are dirty or clean. This space accounting
information is needed all over the place - when finding an empty eraseblock
to put new data to, when reporting amount of empty space, and so on.
We call this subsystem "lprops" which stands for LEB properties.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/lprops.c | 1341 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 1341 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/lprops.c b/fs/ubifs/lprops.c
new file mode 100644
index 0000000..56f43f7
--- /dev/null
+++ b/fs/ubifs/lprops.c
@@ -0,0 +1,1341 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements the functions that access LEB properties and their
+ * categories. LEBs are categorized based on the needs of UBIFS, and the
+ * categories are stored as either heaps or lists to provide a fast way of
+ * finding a LEB in a particular category. For example, UBIFS may need to find
+ * an empty LEB for the journal, or a very dirty LEB for garbage collection.
+ */
+
+#include "ubifs.h"
+
+#if defined(CONFIG_UBIFS_FS_DEBUG_CHK_LPROPS) || \
+ defined(CONFIG_UBIFS_FS_DEBUG_CHK_OTHER)
+static void dbg_check_heap(struct ubifs_info *c, struct ubifs_lpt_heap *heap,
+ int cat, int add_pos);
+#else
+#define dbg_check_heap(c, heap, cat, add_pos) ({})
+#endif
+
+/**
+ * get_heap_comp_val - get the LEB properties value for heap comparisons.
+ * @lprops: LEB properties
+ * @cat: LEB category
+ */
+static int get_heap_comp_val(struct ubifs_lprops *lprops, int cat)
+{
+ switch (cat) {
+ case LPROPS_FREE:
+ return lprops->free;
+ case LPROPS_DIRTY_IDX:
+ return lprops->free + lprops->dirty;
+ default:
+ return lprops->dirty;
+ }
+}
+
+/**
+ * move_up_lpt_heap - move a new heap entry up as far as possible.
+ * @c: UBIFS file-system description object
+ * @heap: LEB category heap
+ * @lprops: LEB properties to move
+ * @cat: LEB category
+ *
+ * New entries to a heap are added at the bottom and then moved up until the
+ * parent's value is greater. In the case of LPT's category heaps, the value
+ * is either the amount of free space or the amount of dirty space, depending
+ * on the category.
+ */
+static void move_up_lpt_heap(struct ubifs_info *c, struct ubifs_lpt_heap *heap,
+ struct ubifs_lprops *lprops, int cat)
+{
+ int val1, val2, hpos;
+
+ hpos = lprops->hpos;
+ if (!hpos)
+ return; /* Already top of the heap */
+ val1 = get_heap_comp_val(lprops, cat);
+ /* Compare to parent and, if greater, move up the heap */
+ do {
+ int ppos = (hpos - 1) / 2;
+
+ val2 = get_heap_comp_val(heap->arr[ppos], cat);
+ if (val2 >= val1)
+ return;
+ /* Greater than parent so move up */
+ heap->arr[ppos]->hpos = hpos;
+ heap->arr[hpos] = heap->arr[ppos];
+ heap->arr[ppos] = lprops;
+ lprops->hpos = ppos;
+ hpos = ppos;
+ } while (hpos);
+}
+
+/**
+ * adjust_lpt_heap - move a changed heap entry up or down the heap.
+ * @c: UBIFS file-system description object
+ * @heap: LEB category heap
+ * @lprops: LEB properties to move
+ * @hpos: heap position of @lprops
+ * @cat: LEB category
+ *
+ * Changed entries in a heap are moved up or down until the parent's value is
+ * greater. In the case of LPT's category heaps, the value is either the amount
+ * of free space or the amount of dirty space, depending on the category.
+ */
+static void adjust_lpt_heap(struct ubifs_info *c, struct ubifs_lpt_heap *heap,
+ struct ubifs_lprops *lprops, int hpos, int cat)
+{
+ int val1, val2, val3, cpos;
+
+ val1 = get_heap_comp_val(lprops, cat);
+ /* Compare to parent and, if greater than parent, move up the heap */
+ if (hpos) {
+ int ppos = (hpos - 1) / 2;
+
+ val2 = get_heap_comp_val(heap->arr[ppos], cat);
+ if (val1 > val2) {
+ /* Greater than parent so move up */
+ while (1) {
+ heap->arr[ppos]->hpos = hpos;
+ heap->arr[hpos] = heap->arr[ppos];
+ heap->arr[ppos] = lprops;
+ lprops->hpos = ppos;
+ hpos = ppos;
+ if (!hpos)
+ return;
+ ppos = (hpos - 1) / 2;
+ val2 = get_heap_comp_val(heap->arr[ppos], cat);
+ if (val1 <= val2)
+ return;
+ /* Still greater than parent so keep going */
+ }
+ }
+ }
+ /* Not greater than parent, so compare to children */
+ while (1) {
+ /* Compare to left child */
+ cpos = hpos * 2 + 1;
+ if (cpos >= heap->cnt)
+ return;
+ val2 = get_heap_comp_val(heap->arr[cpos], cat);
+ if (val1 < val2) {
+ /* Less than left child, so promote biggest child */
+ if (cpos + 1 < heap->cnt) {
+ val3 = get_heap_comp_val(heap->arr[cpos + 1],
+ cat);
+ if (val3 > val2)
+ cpos += 1; /* Right child is bigger */
+ }
+ heap->arr[cpos]->hpos = hpos;
+ heap->arr[hpos] = heap->arr[cpos];
+ heap->arr[cpos] = lprops;
+ lprops->hpos = cpos;
+ hpos = cpos;
+ continue;
+ }
+ /* Compare to right child */
+ cpos += 1;
+ if (cpos >= heap->cnt)
+ return;
+ val3 = get_heap_comp_val(heap->arr[cpos], cat);
+ if (val1 < val3) {
+ /* Less than right child, so promote right child */
+ heap->arr[cpos]->hpos = hpos;
+ heap->arr[hpos] = heap->arr[cpos];
+ heap->arr[cpos] = lprops;
+ lprops->hpos = cpos;
+ hpos = cpos;
+ continue;
+ }
+ return;
+ }
+}
+
+/**
+ * add_to_lpt_heap - add LEB properties to a LEB category heap.
+ * @c: UBIFS file-system description object
+ * @lprops: LEB properties to add
+ * @cat: LEB category
+ *
+ * This function returns %1 if @lprops is added to the heap for LEB category
+ * @cat, otherwise %0 is returned because the heap is full.
+ */
+static int add_to_lpt_heap(struct ubifs_info *c, struct ubifs_lprops *lprops,
+ int cat)
+{
+ struct ubifs_lpt_heap *heap = &c->lpt_heap[cat - 1];
+
+ if (heap->cnt >= heap->max_cnt) {
+ const int b = LPT_HEAP_SZ / 2 - 1;
+ int cpos, val1, val2;
+
+ /* Compare to some other LEB on the bottom of heap */
+ /* Pick a position kind of randomly */
+ cpos = (((size_t)lprops >> 4) & b) + b;
+ ubifs_assert(cpos >= b);
+ ubifs_assert(cpos < LPT_HEAP_SZ);
+ ubifs_assert(cpos < heap->cnt);
+
+ val1 = get_heap_comp_val(lprops, cat);
+ val2 = get_heap_comp_val(heap->arr[cpos], cat);
+ if (val1 > val2) {
+ struct ubifs_lprops *lp;
+
+ lp = heap->arr[cpos];
+ lp->flags &= ~LPROPS_CAT_MASK;
+ lp->flags |= LPROPS_UNCAT;
+ list_add(&lp->list, &c->uncat_list);
+ lprops->hpos = cpos;
+ heap->arr[cpos] = lprops;
+ move_up_lpt_heap(c, heap, lprops, cat);
+ dbg_check_heap(c, heap, cat, lprops->hpos);
+ return 1; /* Added to heap */
+ }
+ dbg_check_heap(c, heap, cat, -1);
+ return 0; /* Not added to heap */
+ } else {
+ lprops->hpos = heap->cnt++;
+ heap->arr[lprops->hpos] = lprops;
+ move_up_lpt_heap(c, heap, lprops, cat);
+ dbg_check_heap(c, heap, cat, lprops->hpos);
+ return 1; /* Added to heap */
+ }
+}
+
+/**
+ * remove_from_lpt_heap - remove LEB properties from a LEB category heap.
+ * @c: UBIFS file-system description object
+ * @lprops: LEB properties to remove
+ * @cat: LEB category
+ */
+static void remove_from_lpt_heap(struct ubifs_info *c,
+ struct ubifs_lprops *lprops, int cat)
+{
+ struct ubifs_lpt_heap *heap;
+ int hpos = lprops->hpos;
+
+ heap = &c->lpt_heap[cat - 1];
+ ubifs_assert(hpos >= 0 && hpos < heap->cnt);
+ ubifs_assert(heap->arr[hpos] == lprops);
+ heap->cnt -= 1;
+ if (hpos < heap->cnt) {
+ heap->arr[hpos] = heap->arr[heap->cnt];
+ heap->arr[hpos]->hpos = hpos;
+ adjust_lpt_heap(c, heap, heap->arr[hpos], hpos, cat);
+ }
+ dbg_check_heap(c, heap, cat, -1);
+}
+
+/**
+ * lpt_heap_replace - replace lprops in a category heap.
+ * @c: UBIFS file-system description object
+ * @old_lprops: LEB properties to replace
+ * @new_lprops: LEB properties with which to replace
+ * @cat: LEB category
+ *
+ * During commit it is sometimes necessary to copy a pnode (see dirty_cow_pnode)
+ * and the lprops that the pnode contains. When that happens, references in
+ * the category heaps to those lprops must be updated to point to the new
+ * lprops. This function does that.
+ */
+static void lpt_heap_replace(struct ubifs_info *c,
+ struct ubifs_lprops *old_lprops,
+ struct ubifs_lprops *new_lprops, int cat)
+{
+ struct ubifs_lpt_heap *heap;
+ int hpos = new_lprops->hpos;
+
+ heap = &c->lpt_heap[cat - 1];
+ heap->arr[hpos] = new_lprops;
+}
+
+/**
+ * ubifs_add_to_cat - add LEB properties to a category list or heap.
+ * @c: UBIFS file-system description object
+ * @lprops: LEB properties to add
+ * @cat: LEB category to which to add
+ *
+ * LEB properties are categorized to enable fast find operations.
+ */
+void ubifs_add_to_cat(struct ubifs_info *c, struct ubifs_lprops *lprops,
+ int cat)
+{
+ switch (cat) {
+ case LPROPS_DIRTY:
+ case LPROPS_DIRTY_IDX:
+ case LPROPS_FREE:
+ if (add_to_lpt_heap(c, lprops, cat))
+ break;
+ /* No more room on heap so make it uncategorized */
+ cat = LPROPS_UNCAT;
+ /* Fall through */
+ case LPROPS_UNCAT:
+ list_add(&lprops->list, &c->uncat_list);
+ break;
+ case LPROPS_EMPTY:
+ list_add(&lprops->list, &c->empty_list);
+ break;
+ case LPROPS_FREEABLE:
+ list_add(&lprops->list, &c->freeable_list);
+ c->freeable_cnt += 1;
+ break;
+ case LPROPS_FRDI_IDX:
+ list_add(&lprops->list, &c->frdi_idx_list);
+ break;
+ default:
+ ubifs_assert(0);
+ }
+ lprops->flags &= ~LPROPS_CAT_MASK;
+ lprops->flags |= cat;
+}
+
+/**
+ * ubifs_remove_from_cat - remove LEB properties from a category list or heap.
+ * @c: UBIFS file-system description object
+ * @lprops: LEB properties to remove
+ * @cat: LEB category from which to remove
+ *
+ * LEB properties are categorized to enable fast find operations.
+ */
+static void ubifs_remove_from_cat(struct ubifs_info *c,
+ struct ubifs_lprops *lprops, int cat)
+{
+ switch (cat) {
+ case LPROPS_DIRTY:
+ case LPROPS_DIRTY_IDX:
+ case LPROPS_FREE:
+ remove_from_lpt_heap(c, lprops, cat);
+ break;
+ case LPROPS_FREEABLE:
+ c->freeable_cnt -= 1;
+ ubifs_assert(c->freeable_cnt >= 0);
+ /* Fall through */
+ case LPROPS_UNCAT:
+ case LPROPS_EMPTY:
+ case LPROPS_FRDI_IDX:
+ ubifs_assert(!list_empty(&lprops->list));
+ list_del(&lprops->list);
+ break;
+ default:
+ ubifs_assert(0);
+ }
+}
+
+/**
+ * ubifs_replace_cat - replace lprops in a category list or heap.
+ * @c: UBIFS file-system description object
+ * @old_lprops: LEB properties to replace
+ * @new_lprops: LEB properties with which to replace
+ *
+ * During commit it is sometimes necessary to copy a pnode (see dirty_cow_pnode)
+ * and the lprops that the pnode contains. When that happens, references in
+ * category lists and heaps must be replaced. This function does that.
+ */
+void ubifs_replace_cat(struct ubifs_info *c, struct ubifs_lprops *old_lprops,
+ struct ubifs_lprops *new_lprops)
+{
+ int cat;
+
+ cat = new_lprops->flags & LPROPS_CAT_MASK;
+ switch (cat) {
+ case LPROPS_DIRTY:
+ case LPROPS_DIRTY_IDX:
+ case LPROPS_FREE:
+ lpt_heap_replace(c, old_lprops, new_lprops, cat);
+ break;
+ case LPROPS_UNCAT:
+ case LPROPS_EMPTY:
+ case LPROPS_FREEABLE:
+ case LPROPS_FRDI_IDX:
+ list_replace(&old_lprops->list, &new_lprops->list);
+ break;
+ default:
+ ubifs_assert(0);
+ }
+}
+
+/**
+ * ubifs_ensure_cat - ensure LEB properties are categorized.
+ * @c: UBIFS file-system description object
+ * @lprops: LEB properties
+ *
+ * A LEB may have fallen off of the bottom of a heap, and ended up as
+ * uncategorized even though it has enough space for us now. If that is the case
+ * this function will put the LEB back onto a heap.
+ */
+void ubifs_ensure_cat(struct ubifs_info *c, struct ubifs_lprops *lprops)
+{
+ int cat = lprops->flags & LPROPS_CAT_MASK;
+
+ if (cat != LPROPS_UNCAT)
+ return;
+ cat = ubifs_categorize_lprops(c, lprops);
+ if (cat == LPROPS_UNCAT)
+ return;
+ ubifs_remove_from_cat(c, lprops, LPROPS_UNCAT);
+ ubifs_add_to_cat(c, lprops, cat);
+}
+
+/**
+ * ubifs_categorize_lprops - categorize LEB properties.
+ * @c: UBIFS file-system description object
+ * @lprops: LEB properties to categorize
+ *
+ * LEB properties are categorized to enable fast find operations. This function
+ * returns the LEB category to which the LEB properties belong. Note however
+ * that if the LEB category is stored as a heap and the heap is full, the
+ * LEB properties may have their category changed to %LPROPS_UNCAT.
+ */
+int ubifs_categorize_lprops(const struct ubifs_info *c,
+ const struct ubifs_lprops *lprops)
+{
+ if (lprops->flags & LPROPS_TAKEN)
+ return LPROPS_UNCAT;
+
+ if (lprops->free == c->leb_size) {
+ ubifs_assert(!(lprops->flags & LPROPS_INDEX));
+ return LPROPS_EMPTY;
+ }
+
+ if (lprops->free + lprops->dirty == c->leb_size) {
+ if (lprops->flags & LPROPS_INDEX)
+ return LPROPS_FRDI_IDX;
+ else
+ return LPROPS_FREEABLE;
+ }
+
+ if (lprops->flags & LPROPS_INDEX) {
+ if (lprops->dirty + lprops->free >= c->min_idx_node_sz)
+ return LPROPS_DIRTY_IDX;
+ } else {
+ if (lprops->dirty >= c->dead_wm &&
+ lprops->dirty > lprops->free)
+ return LPROPS_DIRTY;
+ if (lprops->free > 0)
+ return LPROPS_FREE;
+ }
+
+ return LPROPS_UNCAT;
+}
+
+/**
+ * change_category - change LEB properties category.
+ * @c: UBIFS file-system description object
+ * @lprops: LEB properties to recategorize
+ *
+ * LEB properties are categorized to enable fast find operations. When the LEB
+ * properties change they must be recategorized.
+ */
+static void change_category(struct ubifs_info *c, struct ubifs_lprops *lprops)
+{
+ int old_cat = lprops->flags & LPROPS_CAT_MASK;
+ int new_cat = ubifs_categorize_lprops(c, lprops);
+
+ if (old_cat == new_cat) {
+ struct ubifs_lpt_heap *heap = &c->lpt_heap[new_cat - 1];
+
+ /* lprops on a heap now must be moved up or down */
+ if (new_cat < 1 || new_cat > LPROPS_HEAP_CNT)
+ return; /* Not on a heap */
+ heap = &c->lpt_heap[new_cat - 1];
+ adjust_lpt_heap(c, heap, lprops, lprops->hpos, new_cat);
+ } else {
+ ubifs_remove_from_cat(c, lprops, old_cat);
+ ubifs_add_to_cat(c, lprops, new_cat);
+ }
+}
+
+/**
+ * ubifs_get_lprops - get reference to LEB properties.
+ * @c: the UBIFS file-system description object
+ *
+ * This function locks lprops. Lprops have to be unlocked by
+ * 'ubifs_release_lprops()'.
+ */
+void ubifs_get_lprops(struct ubifs_info *c)
+{
+ mutex_lock(&c->lp_mutex);
+}
+
+/**
+ * calc_dark - calculate LEB dark space size.
+ * @c: the UBIFS file-system description object
+ * @spc: amount of free and dirty space in the LEB
+ *
+ * This function calculates amount of dark space in an LEB which has @spc bytes
+ * of free and dirty space. Returns the calculations result.
+ *
+ * Dark space is the space which is not always usable - it depends on which
+ * nodes are written in which order. E.g., if an LEB has only 512 free bytes,
+ * it is dark space, because it cannot fit a large data node. So UBIFS cannot
+ * count on this LEB and treat these 512 bytes as usable because it is not true
+ * if, for example, only big chunks of uncompressible data will be written to
+ * the FS.
+ */
+static int calc_dark(struct ubifs_info *c, int spc)
+{
+ ubifs_assert(!(spc & 7));
+
+ if (spc < c->dark_wm)
+ return spc;
+
+ /*
+ * If we have slightly more space then the dark space watermark, we can
+ * anyway safely assume it we'll be able to write a node of the
+ * smallest size there.
+ */
+ if (spc - c->dark_wm < MIN_WRITE_SZ)
+ return spc - MIN_WRITE_SZ;
+
+ return c->dark_wm;
+}
+
+/**
+ * is_lprops_dirty - determine if LEB properties are dirty.
+ * @c: the UBIFS file-system description object
+ * @lprops: LEB properties to test
+ */
+static int is_lprops_dirty(struct ubifs_info *c, struct ubifs_lprops *lprops)
+{
+ struct ubifs_pnode *pnode;
+ void *addr;
+ int pos;
+
+ pos = (lprops->lnum - c->main_first) & (UBIFS_LPT_FANOUT - 1);
+ addr = container_of(lprops, struct ubifs_pnode, lprops[0]) -
+ pos * sizeof(struct ubifs_lprops);
+ pnode = (struct ubifs_pnode *)addr;
+
+ return !test_bit(COW_ZNODE, &pnode->flags) &&
+ test_bit(DIRTY_CNODE, &pnode->flags);
+}
+
+/**
+ * ubifs_change_lp - change LEB properties.
+ * @c: the UBIFS file-system description object
+ * @lp: LEB properties to change
+ * @free: new free space amount
+ * @dirty: new dirty space amount
+ * @flags: new flags
+ * @idx_gc_cnt: change to the count of idx_gc list
+ *
+ * This function changes LEB properties. This function does not change a LEB
+ * property (@free, @dirty or @flag) if the value passed is %-1.
+ *
+ * This function returns a pointer to the updated LEB properties on success
+ * and a negative error code on failure. N.B. the LEB properties may have had to
+ * be copied (due to COW) and consequently the pointer returned may not be the
+ * same as the pointer passed.
+ */
+const struct ubifs_lprops *ubifs_change_lp(struct ubifs_info *c,
+ const struct ubifs_lprops *lp,
+ int free, int dirty, int flags,
+ int idx_gc_cnt)
+{
+ /*
+ * This is the only function that is allowed to change lprops, so we
+ * discard the const qualifier.
+ */
+ struct ubifs_lprops *lprops = (struct ubifs_lprops *)lp;
+
+ dbg_lp("LEB %d, free %d, dirty %d, flags %d",
+ lprops->lnum, free, dirty, flags);
+
+ ubifs_assert(mutex_is_locked(&c->lp_mutex));
+ ubifs_assert(c->lst.empty_lebs >= 0 &&
+ c->lst.empty_lebs <= c->main_lebs);
+ ubifs_assert(c->freeable_cnt >= 0);
+ ubifs_assert(c->freeable_cnt <= c->main_lebs);
+ ubifs_assert(c->lst.taken_empty_lebs >= 0);
+ ubifs_assert(c->lst.taken_empty_lebs <= c->lst.empty_lebs);
+ ubifs_assert(!(c->lst.total_free & 7) && !(c->lst.total_dirty & 7));
+ ubifs_assert(!(c->lst.total_dead & 7) && !(c->lst.total_dark & 7));
+ ubifs_assert(!(c->lst.total_used & 7));
+
+ if (!is_lprops_dirty(c, lprops)) {
+ lprops = ubifs_lpt_lookup_dirty(c, lprops->lnum);
+ if (IS_ERR(lprops))
+ return lprops;
+ }
+
+ ubifs_assert(!(lprops->free & 7) && !(lprops->dirty & 7));
+
+ spin_lock(&c->space_lock);
+
+ if ((lprops->flags & LPROPS_TAKEN) && lprops->free == c->leb_size)
+ c->lst.taken_empty_lebs -= 1;
+
+ if (!(lprops->flags & LPROPS_INDEX)) {
+ int old_spc;
+
+ old_spc = lprops->free + lprops->dirty;
+ if (old_spc < c->dead_wm)
+ c->lst.total_dead -= old_spc;
+ else
+ c->lst.total_dark -= calc_dark(c, old_spc);
+
+ c->lst.total_used -= c->leb_size - old_spc;
+ }
+
+ if (free != -1) {
+ free = ALIGN(free, 8);
+ c->lst.total_free += free - lprops->free;
+
+ /* Increase or decrease empty LEBs counter if needed */
+ if (free == c->leb_size) {
+ if (lprops->free != c->leb_size)
+ c->lst.empty_lebs += 1;
+ } else if (lprops->free == c->leb_size)
+ c->lst.empty_lebs -= 1;
+ lprops->free = free;
+ }
+
+ if (dirty != -1) {
+ dirty = ALIGN(dirty, 8);
+ c->lst.total_dirty += dirty - lprops->dirty;
+ lprops->dirty = dirty;
+ }
+
+ if (flags != -1) {
+ /* Take care about indexing LEBs counter if needed */
+ if ((lprops->flags & LPROPS_INDEX)) {
+ if (!(flags & LPROPS_INDEX))
+ c->lst.idx_lebs -= 1;
+ } else if (flags & LPROPS_INDEX)
+ c->lst.idx_lebs += 1;
+ lprops->flags = flags;
+ }
+
+ if (!(lprops->flags & LPROPS_INDEX)) {
+ int new_spc;
+
+ new_spc = lprops->free + lprops->dirty;
+ if (new_spc < c->dead_wm)
+ c->lst.total_dead += new_spc;
+ else
+ c->lst.total_dark += calc_dark(c, new_spc);
+
+ c->lst.total_used += c->leb_size - new_spc;
+ }
+
+ if ((lprops->flags & LPROPS_TAKEN) && lprops->free == c->leb_size)
+ c->lst.taken_empty_lebs += 1;
+
+ change_category(c, lprops);
+
+ c->idx_gc_cnt += idx_gc_cnt;
+
+ spin_unlock(&c->space_lock);
+
+ return lprops;
+}
+
+/**
+ * ubifs_release_lprops - release lprops lock.
+ * @c: the UBIFS file-system description object
+ *
+ * This function has to be called after each 'ubifs_get_lprops()' call to
+ * unlock lprops.
+ */
+void ubifs_release_lprops(struct ubifs_info *c)
+{
+ ubifs_assert(mutex_is_locked(&c->lp_mutex));
+ ubifs_assert(c->lst.empty_lebs >= 0 &&
+ c->lst.empty_lebs <= c->main_lebs);
+
+ mutex_unlock(&c->lp_mutex);
+}
+
+/**
+ * ubifs_get_lp_stats - get lprops statistics.
+ * @c: UBIFS file-system description object
+ * @st: return statistics
+ */
+void ubifs_get_lp_stats(struct ubifs_info *c, struct ubifs_lp_stats *st)
+{
+ spin_lock(&c->space_lock);
+ memcpy(st, &c->lst, sizeof(struct ubifs_lp_stats));
+ spin_unlock(&c->space_lock);
+}
+
+/**
+ * ubifs_change_one_lp - change LEB properties.
+ * @c: the UBIFS file-system description object
+ * @lnum: LEB to change properties for
+ * @free: amount of free space
+ * @dirty: amount of dirty space
+ * @flags_set: flags to set
+ * @flags_clean: flags to clean
+ * @idx_gc_cnt: change to the count of idx_gc list
+ *
+ * This function changes properties of LEB @lnum. It is a helper wrapper over
+ * 'ubifs_change_lp()' which hides lprops get/release. The arguments are the
+ * same as in case of 'ubifs_change_lp()'. Returns zero in case of success and
+ * a negative error code in case of failure.
+ */
+int ubifs_change_one_lp(struct ubifs_info *c, int lnum, int free, int dirty,
+ int flags_set, int flags_clean, int idx_gc_cnt)
+{
+ int err = 0, flags;
+ const struct ubifs_lprops *lp;
+
+ ubifs_get_lprops(c);
+
+ lp = ubifs_lpt_lookup_dirty(c, lnum);
+ if (IS_ERR(lp)) {
+ err = PTR_ERR(lp);
+ goto out;
+ }
+
+ flags = (lp->flags | flags_set) & ~flags_clean;
+ lp = ubifs_change_lp(c, lp, free, dirty, flags, idx_gc_cnt);
+ if (IS_ERR(lp))
+ err = PTR_ERR(lp);
+
+out:
+ ubifs_release_lprops(c);
+ return err;
+}
+
+/**
+ * ubifs_update_one_lp - update LEB properties.
+ * @c: the UBIFS file-system description object
+ * @lnum: LEB to change properties for
+ * @free: amount of free space
+ * @dirty: amount of dirty space to add
+ * @flags_set: flags to set
+ * @flags_clean: flags to clean
+ *
+ * This function is the same as 'ubifs_change_one_lp()' but @dirty is added to
+ * current dirty space, not substitutes it.
+ */
+int ubifs_update_one_lp(struct ubifs_info *c, int lnum, int free, int dirty,
+ int flags_set, int flags_clean)
+{
+ int err = 0, flags;
+ const struct ubifs_lprops *lp;
+
+ ubifs_get_lprops(c);
+
+ lp = ubifs_lpt_lookup_dirty(c, lnum);
+ if (IS_ERR(lp)) {
+ err = PTR_ERR(lp);
+ goto out;
+ }
+
+ flags = (lp->flags | flags_set) & ~flags_clean;
+ lp = ubifs_change_lp(c, lp, free, lp->dirty + dirty, flags, 0);
+ if (IS_ERR(lp))
+ err = PTR_ERR(lp);
+
+out:
+ ubifs_release_lprops(c);
+ return err;
+}
+
+/**
+ * ubifs_read_one_lp - read LEB properties.
+ * @c: the UBIFS file-system description object
+ * @lnum: LEB to read properties for
+ * @lp: where to store read properties
+ *
+ * This helper function reads properties of a LEB @lnum and stores them in @lp.
+ * Returns zero in case of success and a negative error code in case of
+ * failure.
+ */
+int ubifs_read_one_lp(struct ubifs_info *c, int lnum, struct ubifs_lprops *lp)
+{
+ int err = 0;
+ const struct ubifs_lprops *lpp;
+
+ ubifs_get_lprops(c);
+
+ lpp = ubifs_lpt_lookup(c, lnum);
+ if (IS_ERR(lpp)) {
+ err = PTR_ERR(lpp);
+ goto out;
+ }
+
+ memcpy(lp, lpp, sizeof(struct ubifs_lprops));
+
+out:
+ ubifs_release_lprops(c);
+ return err;
+}
+
+/**
+ * ubifs_fast_find_free - try to find a LEB with free space quickly.
+ * @c: the UBIFS file-system description object
+ *
+ * This function returns LEB properties for a LEB with free space or %NULL if
+ * the function is unable to find a LEB quickly.
+ */
+const struct ubifs_lprops *ubifs_fast_find_free(struct ubifs_info *c)
+{
+ struct ubifs_lprops *lprops;
+ struct ubifs_lpt_heap *heap;
+
+ ubifs_assert(mutex_is_locked(&c->lp_mutex));
+
+ heap = &c->lpt_heap[LPROPS_FREE - 1];
+ if (heap->cnt == 0)
+ return NULL;
+
+ lprops = heap->arr[0];
+ ubifs_assert(!(lprops->flags & LPROPS_TAKEN));
+ ubifs_assert(!(lprops->flags & LPROPS_INDEX));
+ return lprops;
+}
+
+/**
+ * ubifs_fast_find_empty - try to find an empty LEB quickly.
+ * @c: the UBIFS file-system description object
+ *
+ * This function returns LEB properties for an empty LEB or %NULL if the
+ * function is unable to find an empty LEB quickly.
+ */
+const struct ubifs_lprops *ubifs_fast_find_empty(struct ubifs_info *c)
+{
+ struct ubifs_lprops *lprops;
+
+ ubifs_assert(mutex_is_locked(&c->lp_mutex));
+
+ if (list_empty(&c->empty_list))
+ return NULL;
+
+ lprops = list_entry(c->empty_list.next, struct ubifs_lprops, list);
+ ubifs_assert(!(lprops->flags & LPROPS_TAKEN));
+ ubifs_assert(!(lprops->flags & LPROPS_INDEX));
+ ubifs_assert(lprops->free == c->leb_size);
+ return lprops;
+}
+
+/**
+ * ubifs_fast_find_freeable - try to find a freeable LEB quickly.
+ * @c: the UBIFS file-system description object
+ *
+ * This function returns LEB properties for a freeable LEB or %NULL if the
+ * function is unable to find a freeable LEB quickly.
+ */
+const struct ubifs_lprops *ubifs_fast_find_freeable(struct ubifs_info *c)
+{
+ struct ubifs_lprops *lprops;
+
+ ubifs_assert(mutex_is_locked(&c->lp_mutex));
+
+ if (list_empty(&c->freeable_list))
+ return NULL;
+
+ lprops = list_entry(c->freeable_list.next, struct ubifs_lprops, list);
+ ubifs_assert(!(lprops->flags & LPROPS_TAKEN));
+ ubifs_assert(!(lprops->flags & LPROPS_INDEX));
+ ubifs_assert(lprops->free + lprops->dirty == c->leb_size);
+ ubifs_assert(c->freeable_cnt > 0);
+ return lprops;
+}
+
+/**
+ * ubifs_fast_find_frdi_idx - try to find a freeable index LEB quickly.
+ * @c: the UBIFS file-system description object
+ *
+ * This function returns LEB properties for a freeable index LEB or %NULL if the
+ * function is unable to find a freeable index LEB quickly.
+ */
+const struct ubifs_lprops *ubifs_fast_find_frdi_idx(struct ubifs_info *c)
+{
+ struct ubifs_lprops *lprops;
+
+ ubifs_assert(mutex_is_locked(&c->lp_mutex));
+
+ if (list_empty(&c->frdi_idx_list))
+ return NULL;
+
+ lprops = list_entry(c->frdi_idx_list.next, struct ubifs_lprops, list);
+ ubifs_assert(!(lprops->flags & LPROPS_TAKEN));
+ ubifs_assert((lprops->flags & LPROPS_INDEX));
+ ubifs_assert(lprops->free + lprops->dirty == c->leb_size);
+ return lprops;
+}
+
+#if defined(CONFIG_UBIFS_FS_DEBUG_CHK_LPROPS) || \
+ defined(CONFIG_UBIFS_FS_DEBUG_CHK_OTHER)
+
+/**
+ * dbg_check_cats - check category heaps and lists.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int dbg_check_cats(struct ubifs_info *c)
+{
+ struct ubifs_lprops *lprops;
+ struct list_head *pos;
+ int i, cat;
+
+ list_for_each_entry(lprops, &c->empty_list, list) {
+ if (lprops->free != c->leb_size) {
+ ubifs_err("non-empty LEB %d on empty list "
+ "(free %d dirty %d flags %d)", lprops->lnum,
+ lprops->free, lprops->dirty, lprops->flags);
+ return -EINVAL;
+ }
+ if (lprops->flags & LPROPS_TAKEN) {
+ ubifs_err("taken LEB %d on empty list "
+ "(free %d dirty %d flags %d)", lprops->lnum,
+ lprops->free, lprops->dirty, lprops->flags);
+ return -EINVAL;
+ }
+ }
+
+ i = 0;
+ list_for_each_entry(lprops, &c->freeable_list, list) {
+ if (lprops->free + lprops->dirty != c->leb_size) {
+ ubifs_err("non-freeable LEB %d on freeable list "
+ "(free %d dirty %d flags %d)", lprops->lnum,
+ lprops->free, lprops->dirty, lprops->flags);
+ return -EINVAL;
+ }
+ if (lprops->flags & LPROPS_TAKEN) {
+ ubifs_err("taken LEB %d on freeable list "
+ "(free %d dirty %d flags %d)", lprops->lnum,
+ lprops->free, lprops->dirty, lprops->flags);
+ return -EINVAL;
+ }
+ i += 1;
+ }
+ if (i != c->freeable_cnt) {
+ ubifs_err("freeable list count %d expected %d", i,
+ c->freeable_cnt);
+ return -EINVAL;
+ }
+
+ i = 0;
+ list_for_each(pos, &c->idx_gc)
+ i += 1;
+ if (i != c->idx_gc_cnt) {
+ ubifs_err("idx_gc list count %d expected %d", i,
+ c->idx_gc_cnt);
+ return -EINVAL;
+ }
+
+ list_for_each_entry(lprops, &c->frdi_idx_list, list) {
+ if (lprops->free + lprops->dirty != c->leb_size) {
+ ubifs_err("non-freeable LEB %d on frdi_idx list "
+ "(free %d dirty %d flags %d)", lprops->lnum,
+ lprops->free, lprops->dirty, lprops->flags);
+ return -EINVAL;
+ }
+ if (lprops->flags & LPROPS_TAKEN) {
+ ubifs_err("taken LEB %d on frdi_idx list "
+ "(free %d dirty %d flags %d)", lprops->lnum,
+ lprops->free, lprops->dirty, lprops->flags);
+ return -EINVAL;
+ }
+ if (!(lprops->flags & LPROPS_INDEX)) {
+ ubifs_err("non-index LEB %d on frdi_idx list "
+ "(free %d dirty %d flags %d)", lprops->lnum,
+ lprops->free, lprops->dirty, lprops->flags);
+ return -EINVAL;
+ }
+ }
+
+ for (cat = 1; cat <= LPROPS_HEAP_CNT; cat++) {
+ struct ubifs_lpt_heap *heap = &c->lpt_heap[cat - 1];
+
+ for (i = 0; i < heap->cnt; i++) {
+ lprops = heap->arr[i];
+ if (lprops == NULL) {
+ ubifs_err("null ptr in LPT heap cat %d", cat);
+ return -EINVAL;
+ }
+ if (lprops->hpos != i) {
+ ubifs_err("bad ptr in LPT heap cat %d", cat);
+ return -EINVAL;
+ }
+ if (lprops->flags & LPROPS_TAKEN) {
+ ubifs_err("taken LEB in LPT heap cat %d", cat);
+ return -EINVAL;
+ }
+ }
+ }
+
+ return 0;
+}
+
+static void dbg_check_heap(struct ubifs_info *c, struct ubifs_lpt_heap *heap,
+ int cat, int add_pos)
+{
+ int i = 0, j, err = 0;
+
+ for (i = 0; i < heap->cnt; i++) {
+ struct ubifs_lprops *lprops = heap->arr[i];
+ struct ubifs_lprops *lp;
+
+ if (i != add_pos)
+ if ((lprops->flags & LPROPS_CAT_MASK) != cat) {
+ err = 1;
+ goto out;
+ }
+ if (lprops->hpos != i) {
+ err = 2;
+ goto out;
+ }
+ lp = ubifs_lpt_lookup(c, lprops->lnum);
+ if (IS_ERR(lp)) {
+ err = 3;
+ goto out;
+ }
+ if (lprops != lp) {
+ dbg_msg("lprops %zx lp %zx lprops->lnum %d lp->lnum %d",
+ (size_t)lprops, (size_t)lp, lprops->lnum,
+ lp->lnum);
+ err = 4;
+ goto out;
+ }
+ for (j = 0; j < i; j++) {
+ lp = heap->arr[j];
+ if (lp == lprops) {
+ err = 5;
+ goto out;
+ }
+ if (lp->lnum == lprops->lnum) {
+ err = 6;
+ goto out;
+ }
+ }
+ }
+out:
+ if (err) {
+ dbg_msg("failed cat %d hpos %d err %d", cat, i, err);
+ dbg_dump_stack();
+ dbg_dump_heap(c, heap, cat);
+ }
+}
+
+#endif
+
+#ifdef CONFIG_UBIFS_FS_DEBUG_CHK_LPROPS
+
+/**
+ * struct scan_check_data - data provided to scan callback function.
+ * @lst: LEB properties statistics
+ * @err: error code
+ */
+struct scan_check_data {
+ struct ubifs_lp_stats lst;
+ int err;
+};
+
+/**
+ * scan_check_cb - scan callback.
+ * @c: the UBIFS file-system description object
+ * @lp: LEB properties to scan
+ * @in_tree: whether the LEB properties are in main memory
+ * @data: information passed to and from the caller of the scan
+ *
+ * This function returns a code that indicates whether the scan should continue
+ * (%LPT_SCAN_CONTINUE), whether the LEB properties should be added to the tree
+ * in main memory (%LPT_SCAN_ADD), or whether the scan should stop
+ * (%LPT_SCAN_STOP).
+ */
+static int scan_check_cb(struct ubifs_info *c,
+ const struct ubifs_lprops *lp, int in_tree,
+ struct scan_check_data *data)
+{
+ struct ubifs_scan_leb *sleb;
+ struct ubifs_scan_node *snod;
+ struct ubifs_lp_stats *lst = &data->lst;
+ int cat, lnum = lp->lnum, is_idx = 0, used = 0, free, dirty;
+
+ cat = lp->flags & LPROPS_CAT_MASK;
+ if (cat != LPROPS_UNCAT) {
+ cat = ubifs_categorize_lprops(c, lp);
+ if (cat != (lp->flags & LPROPS_CAT_MASK)) {
+ ubifs_err("bad LEB category %d expected %d",
+ (lp->flags & LPROPS_CAT_MASK), cat);
+ goto out;
+ }
+ }
+
+ /* Check lp is on its category list (if it has one) */
+ if (in_tree) {
+ struct list_head *list = NULL;
+
+ switch (cat) {
+ case LPROPS_EMPTY:
+ list = &c->empty_list;
+ break;
+ case LPROPS_FREEABLE:
+ list = &c->freeable_list;
+ break;
+ case LPROPS_FRDI_IDX:
+ list = &c->frdi_idx_list;
+ break;
+ case LPROPS_UNCAT:
+ list = &c->uncat_list;
+ break;
+ }
+ if (list) {
+ struct ubifs_lprops *lprops;
+ int found = 0;
+
+ list_for_each_entry(lprops, list, list) {
+ if (lprops == lp) {
+ found = 1;
+ break;
+ }
+ }
+ if (!found) {
+ ubifs_err("bad LPT list (category %d)", cat);
+ goto out;
+ }
+ }
+ }
+
+ /* Check lp is on its category heap (if it has one) */
+ if (in_tree && cat > 0 && cat <= LPROPS_HEAP_CNT) {
+ struct ubifs_lpt_heap *heap = &c->lpt_heap[cat - 1];
+
+ if ((lp->hpos != -1 && heap->arr[lp->hpos]->lnum != lnum) ||
+ lp != heap->arr[lp->hpos]) {
+ ubifs_err("bad LPT heap (category %d)", cat);
+ goto out;
+ }
+ }
+
+ sleb = ubifs_scan(c, lnum, 0, c->dbg_buf);
+ if (IS_ERR(sleb)) {
+ /*
+ * After an unclean unmount, empty and freeable LEBs
+ * may contain garbage.
+ */
+ if (lp->free == c->leb_size) {
+ ubifs_err("scan errors were in empty LEB "
+ "- continuing checking");
+ lst->empty_lebs += 1;
+ lst->total_free += c->leb_size;
+ lst->total_dark += calc_dark(c, c->leb_size);
+ return LPT_SCAN_CONTINUE;
+ }
+
+ if (lp->free + lp->dirty == c->leb_size &&
+ !(lp->flags & LPROPS_INDEX)) {
+ ubifs_err("scan errors were in freeable LEB "
+ "- continuing checking");
+ lst->total_free += lp->free;
+ lst->total_dirty += lp->dirty;
+ lst->total_dark += calc_dark(c, c->leb_size);
+ return LPT_SCAN_CONTINUE;
+ }
+ data->err = PTR_ERR(sleb);
+ return LPT_SCAN_STOP;
+ }
+
+ is_idx = -1;
+ list_for_each_entry(snod, &sleb->nodes, list) {
+ int found, level = 0;
+
+ cond_resched();
+
+ if (is_idx == -1)
+ is_idx = (snod->type == UBIFS_IDX_NODE) ? 1 : 0;
+
+ if (is_idx && snod->type != UBIFS_IDX_NODE) {
+ ubifs_err("indexing node in data LEB %d:%d",
+ lnum, snod->offs);
+ goto out_destroy;
+ }
+
+ if (snod->type == UBIFS_IDX_NODE) {
+ struct ubifs_idx_node *idx = snod->node;
+
+ key_read(c, ubifs_idx_key(c, idx), &snod->key);
+ level = le16_to_cpu(idx->level);
+ }
+
+ found = ubifs_tnc_has_node(c, &snod->key, level, lnum,
+ snod->offs, is_idx);
+ if (found) {
+ if (found < 0)
+ goto out_destroy;
+ used += ALIGN(snod->len, 8);
+ }
+ }
+
+ free = c->leb_size - sleb->endpt;
+ dirty = sleb->endpt - used;
+
+ if (free > c->leb_size || free < 0 || dirty > c->leb_size ||
+ dirty < 0) {
+ ubifs_err("bad calculated accounting for LEB %d: "
+ "free %d, dirty %d", lnum, free, dirty);
+ goto out_destroy;
+ }
+
+ if (lp->free + lp->dirty == c->leb_size &&
+ free + dirty == c->leb_size)
+ if ((is_idx && !(lp->flags & LPROPS_INDEX)) ||
+ (!is_idx && free == c->leb_size)) {
+ /*
+ * Empty or freeable LEBs could contain index
+ * nodes from an uncompleted commit due to an
+ * unclean unmount. Or they could be empty for
+ * the same reason.
+ */
+ free = lp->free;
+ dirty = lp->dirty;
+ is_idx = 0;
+ }
+
+ if (lp->free != free || lp->dirty != dirty)
+ goto out_print;
+
+ if (is_idx && !(lp->flags & LPROPS_INDEX)) {
+ if (free == c->leb_size)
+ /* Free but not unmapped LEB, it's fine */
+ is_idx = 0;
+ else {
+ ubifs_err("indexing node without indexing "
+ "flag");
+ goto out_print;
+ }
+ }
+
+ if (!is_idx && (lp->flags & LPROPS_INDEX)) {
+ ubifs_err("data node with indexing flag");
+ goto out_print;
+ }
+
+ if (free == c->leb_size)
+ lst->empty_lebs += 1;
+
+ if (is_idx)
+ lst->idx_lebs += 1;
+
+ if (!(lp->flags & LPROPS_INDEX))
+ lst->total_used += c->leb_size - free - dirty;
+ lst->total_free += free;
+ lst->total_dirty += dirty;
+
+ if (!(lp->flags & LPROPS_INDEX)) {
+ int spc = free + dirty;
+
+ if (spc < c->dead_wm)
+ lst->total_dead += spc;
+ else
+ lst->total_dark += calc_dark(c, spc);
+ }
+
+ ubifs_scan_destroy(sleb);
+
+ return LPT_SCAN_CONTINUE;
+
+out_print:
+ ubifs_err("bad accounting of LEB %d: free %d, dirty %d flags %#x, "
+ "should be free %d, dirty %d",
+ lnum, lp->free, lp->dirty, lp->flags, free, dirty);
+ dbg_dump_leb(c, lnum);
+out_destroy:
+ ubifs_scan_destroy(sleb);
+out:
+ data->err = -EINVAL;
+ return LPT_SCAN_STOP;
+}
+
+/**
+ * dbg_check_lprops - check all LEB properties.
+ * @c: UBIFS file-system description object
+ *
+ * This function checks all LEB properties and makes sure they are all correct.
+ * It returns zero if everything is fine, %-EINVAL if there is an inconsistency
+ * and other negative error codes in case of other errors. This function is
+ * called while the file system is locked (because of commit start), so no
+ * additional locking is required. Note that locking the LPT mutex would cause
+ * a circular lock dependency with the TNC mutex.
+ */
+int dbg_check_lprops(struct ubifs_info *c)
+{
+ int i, err;
+ struct scan_check_data data;
+ struct ubifs_lp_stats *lst = &data.lst;
+
+ /*
+ * As we are going to scan the media, the write buffers have to be
+ * synchronized.
+ */
+ for (i = 0; i < c->jhead_cnt; i++) {
+ err = ubifs_wbuf_sync(&c->jheads[i].wbuf);
+ if (err)
+ return err;
+ }
+
+ memset(lst, 0, sizeof(struct ubifs_lp_stats));
+
+ data.err = 0;
+ err = ubifs_lpt_scan_nolock(c, c->main_first, c->leb_cnt - 1,
+ (ubifs_lpt_scan_callback)scan_check_cb,
+ &data);
+ if (err && err != -ENOSPC)
+ goto out;
+ if (data.err) {
+ err = data.err;
+ goto out;
+ }
+
+ if (lst->empty_lebs != c->lst.empty_lebs ||
+ lst->idx_lebs != c->lst.idx_lebs ||
+ lst->total_free != c->lst.total_free ||
+ lst->total_dirty != c->lst.total_dirty ||
+ lst->total_used != c->lst.total_used) {
+ ubifs_err("bad overall accounting");
+ ubifs_err("calculated: empty_lebs %d, idx_lebs %d, "
+ "total_free %lld, total_dirty %lld, total_used %lld",
+ lst->empty_lebs, lst->idx_lebs, lst->total_free,
+ lst->total_dirty, lst->total_used);
+ ubifs_err("read from lprops: empty_lebs %d, idx_lebs %d, "
+ "total_free %lld, total_dirty %lld, total_used %lld",
+ c->lst.empty_lebs, c->lst.idx_lebs, c->lst.total_free,
+ c->lst.total_dirty, c->lst.total_used);
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (lst->total_dead != c->lst.total_dead ||
+ lst->total_dark != c->lst.total_dark) {
+ ubifs_err("bad dead/dark space accounting");
+ ubifs_err("calculated: total_dead %lld, total_dark %lld",
+ lst->total_dead, lst->total_dark);
+ ubifs_err("read from lprops: total_dead %lld, total_dark %lld",
+ c->lst.total_dead, c->lst.total_dark);
+ err = -EINVAL;
+ goto out;
+ }
+
+ err = dbg_check_cats(c);
+out:
+ return err;
+}
+
+#endif /* CONFIG_UBIFS_FS_DEBUG_CHK_LPROPS */
--
1.5.4.1

2008-03-27 13:12:12

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 02/26] UBIFS: add I/O sub-system

This sub-system is responsible for performing all the I/O-related
low-level things like calculating and checking checksums, doing
basic node validation, adding correct padding to the nodes and
so on. It also implements UBIFS write-buffers and their proper
synchronization.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/io.c | 921 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 921 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/io.c b/fs/ubifs/io.c
new file mode 100644
index 0000000..182f25c
--- /dev/null
+++ b/fs/ubifs/io.c
@@ -0,0 +1,921 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ * Copyright (C) 2006, 2007 University of Szeged, Hungary
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ * Zoltan Sogor
+ */
+
+/*
+ * This file implements UBIFS I/O subsystem which provides various I/O-related
+ * helper functions (reading/writing/checking/validating nodes) and implements
+ * write-buffering support. Write buffers help to save space which otherwise
+ * would have been wasted for padding to the nearest minimal I/O unit boundary.
+ * Instead, data first goes to the write-buffer and is flushed when the
+ * buffer is full or when it is not used for some time (by timer). This is
+ * similarto the mechanism is used by JFFS2.
+ *
+ * Write-buffers are defined by 'struct ubifs_wbuf' objects and protected by
+ * mutexes defined inside these objects. Since sometimes upper-level code
+ * has to lock the write-buffer (e.g. journal space reservation code), many
+ * functions related to write-buffers have "nolock" suffix which means that the
+ * caller has to lock the write-buffer before calling this function.
+ *
+ * UBIFS stores nodes at 64 bit-aligned addresses. If the node length is not
+ * aligned, UBIFS starts the next node from the aligned address, and the padded
+ * bytes may contain any rubbish. In other words, UBIFS does not put padding
+ * bytes in those small gaps. Common headers of nodes store real node lengths,
+ * not aligned lengths. Indexing nodes also store real lengths in branches.
+ *
+ * UBIFS uses padding when it pads to the next min. I/O unit. In this case it
+ * uses padding nodes or padding bytes, if the padding node does not fit.
+ *
+ * All UBIFS nodes are protected by CRC checksums and UBIFS checks all nodes
+ * every time they are read from the flash media.
+ */
+
+#include <linux/crc32.h>
+#include "ubifs.h"
+
+/**
+ * ubifs_check_node - check node.
+ * @c: UBIFS file-system description object
+ * @buf: node to check
+ * @lnum: logical eraseblock number
+ * @offs: offset within the logical eraseblock
+ * @quiet: print no messages
+ *
+ * This function checks node magic number and CRC checksum. This function also
+ * validates node length to prevent UBIFS from becoming crazy when an attacker
+ * feeds it a file-system image with incorrect nodes. For example, too large
+ * node length in the common header could cause UBIFS to read memory outside of
+ * allocated buffer when checking the CRC checksum.
+ *
+ * This function returns zero in case of success %-EUCLEAN in case of bad CRC
+ * or magic.
+ */
+int ubifs_check_node(const struct ubifs_info *c, const void *buf, int lnum,
+ int offs, int quiet)
+{
+ int err = -EINVAL, type, node_len;
+ uint32_t crc, node_crc, magic;
+ const struct ubifs_ch *ch = buf;
+
+ ubifs_assert(lnum >= 0 && lnum < c->leb_cnt && offs >= 0);
+ ubifs_assert(!(offs & 7) && offs < c->leb_size);
+
+ magic = le32_to_cpu(ch->magic);
+ if (magic != UBIFS_NODE_MAGIC) {
+ if (!quiet)
+ ubifs_err("bad magic %#08x, expected %#08x",
+ magic, UBIFS_NODE_MAGIC);
+ err = -EUCLEAN;
+ goto out;
+ }
+
+ type = ch->node_type;
+ if (type < 0 || type >= UBIFS_NODE_TYPES_CNT) {
+ if (!quiet)
+ ubifs_err("bad node type %d", type);
+ goto out;
+ }
+
+ node_len = le32_to_cpu(ch->len);
+ if (node_len + offs > c->leb_size)
+ goto out_len;
+
+ if (c->ranges[type].max_len == 0) {
+ if (node_len != c->ranges[type].len)
+ goto out_len;
+ } else if (node_len < c->ranges[type].min_len ||
+ node_len > c->ranges[type].max_len)
+ goto out_len;
+
+ crc = crc32(UBIFS_CRC32_INIT, buf + 8, node_len - 8);
+ node_crc = le32_to_cpu(ch->crc);
+ if (crc != node_crc) {
+ if (!quiet)
+ ubifs_err("bad CRC: calculated %#08x, read %#08x",
+ crc, node_crc);
+ err = -EUCLEAN;
+ goto out;
+ }
+
+ return 0;
+
+out_len:
+ if (!quiet)
+ ubifs_err("bad node length %d", node_len);
+out:
+ if (!quiet) {
+ ubifs_err("bad node at LEB %d:%d", lnum, offs);
+ dbg_dump_node(c, buf);
+ dbg_dump_stack();
+ }
+ return err;
+}
+
+/*
+ * ubifs_pad - pad flash space.
+ * @c: UBIFS file-system description object
+ * @buf: buffer to put padding to
+ * @pad: how many bytes to pad
+ *
+ * The flash media obliges us to write only in chunks of %c->min_io_size and
+ * when we have to write less data we add padding node to the write-buffer and
+ * pad it to the next minimal I/O unit's boundary. Padding nodes help when the
+ * media is being scanned. If the amount of wasted space is not enough to fit a
+ * padding node which takes %UBIFS_PAD_NODE_SZ bytes, we write padding bytes
+ * pattern (%UBIFS_PADDING_BYTE).
+ *
+ * Padding nodes are also used to fill gaps when the "commit-in-gaps" method is
+ * used.
+ */
+void ubifs_pad(const struct ubifs_info *c, void *buf, int pad)
+{
+ uint32_t crc;
+
+ ubifs_assert(pad >= 0 && !(pad & 7));
+
+ if (pad >= UBIFS_PAD_NODE_SZ) {
+ struct ubifs_ch *ch = buf;
+ struct ubifs_pad_node *pad_node = buf;
+
+ ch->magic = cpu_to_le32(UBIFS_NODE_MAGIC);
+ ch->node_type = UBIFS_PAD_NODE;
+ ch->group_type = UBIFS_NO_NODE_GROUP;
+ ch->padding[0] = ch->padding[1] = 0;
+ ch->sqnum = cpu_to_le64(0);
+ ch->len = cpu_to_le32(UBIFS_PAD_NODE_SZ);
+ pad -= UBIFS_PAD_NODE_SZ;
+ pad_node->pad_len = cpu_to_le32(pad);
+ crc = crc32(UBIFS_CRC32_INIT, buf + 8, UBIFS_PAD_NODE_SZ - 8);
+ ch->crc = cpu_to_le32(crc);
+ memset(buf + UBIFS_PAD_NODE_SZ, 0, pad);
+ } else if (pad > 0)
+ /* Too little space, padding node won't fit */
+ memset(buf, UBIFS_PADDING_BYTE, pad);
+}
+
+/**
+ * next_sqnum - get next sequence number.
+ * @c: UBIFS file-system description object
+ */
+static unsigned long long next_sqnum(struct ubifs_info *c)
+{
+ unsigned long long sqnum;
+
+ spin_lock(&c->cnt_lock);
+ sqnum = ++c->max_sqnum;
+ spin_unlock(&c->cnt_lock);
+
+ if (unlikely(sqnum >= SQNUM_WARN_WATERMARK)) {
+ if (sqnum >= SQNUM_WATERMARK) {
+ ubifs_err("sequence number overflow %llu, end of life",
+ sqnum);
+ ubifs_ro_mode(c);
+ }
+ ubifs_warn("running out of sequence numbers, end of life soon");
+ }
+
+ return sqnum;
+}
+
+/**
+ * ubifs_prepare_node - prepare node to be written to flash.
+ * @c: UBIFS file-system description object
+ * @node: the node to pad
+ * @len: node length
+ * @pad: if the buffer has to be padded
+ *
+ * This function prepares node at @node to be written to the media - it
+ * calculates node CRC, fills the common header, and adds proper padding up to
+ * the next minimum I/O unit if @pad is not zero.
+ */
+void ubifs_prepare_node(struct ubifs_info *c, void *node, int len, int pad)
+{
+ uint32_t crc;
+ struct ubifs_ch *ch = node;
+ unsigned long long sqnum = next_sqnum(c);
+
+ ubifs_assert(len >= UBIFS_CH_SZ);
+
+ ch->magic = cpu_to_le32(UBIFS_NODE_MAGIC);
+ ch->len = cpu_to_le32(len);
+ ch->group_type = UBIFS_NO_NODE_GROUP;
+ ch->sqnum = cpu_to_le64(sqnum);
+ ch->padding[0] = ch->padding[1] = 0;
+ crc = crc32(UBIFS_CRC32_INIT, node + 8, len - 8);
+ ch->crc = cpu_to_le32(crc);
+
+ if (pad) {
+ len = ALIGN(len, 8);
+ pad = ALIGN(len, c->min_io_size) - len;
+ ubifs_pad(c, node + len, pad);
+ }
+}
+
+/**
+ * ubifs_prep_grp_node - prepare node of a group to be written to flash.
+ * @c: UBIFS file-system description object
+ * @node: the node to pad
+ * @len: node length
+ * @last: indicates the last node of the group
+ *
+ * This function prepares node at @node to be written to the media - it
+ * calculates node CRC and fills the common header.
+ */
+void ubifs_prep_grp_node(struct ubifs_info *c, void *node, int len, int last)
+{
+ uint32_t crc;
+ struct ubifs_ch *ch = node;
+ unsigned long long sqnum = next_sqnum(c);
+
+ ubifs_assert(len >= UBIFS_CH_SZ);
+
+ ch->magic = cpu_to_le32(UBIFS_NODE_MAGIC);
+ ch->len = cpu_to_le32(len);
+ if (last)
+ ch->group_type = UBIFS_LAST_OF_NODE_GROUP;
+ else
+ ch->group_type = UBIFS_IN_NODE_GROUP;
+ ch->sqnum = cpu_to_le64(sqnum);
+ ch->padding[0] = ch->padding[1] = 0;
+ crc = crc32(UBIFS_CRC32_INIT, node + 8, len - 8);
+ ch->crc = cpu_to_le32(crc);
+}
+
+/**
+ * wbuf_timer_callback - write-buffer timer callback function.
+ * @data: timer data (write-buffer descriptor)
+ *
+ * This function is called when the write-buffer timer expires.
+ */
+static void wbuf_timer_callback_nolock(unsigned long data)
+{
+ struct ubifs_wbuf *wbuf = (struct ubifs_wbuf *)data;
+
+ wbuf->need_sync = 1;
+ wbuf->c->need_wbuf_sync = 1;
+ ubifs_wake_up_bgt(wbuf->c);
+}
+
+/**
+ * new_wbuf_timer - start new write-buffer timer.
+ * @wbuf: write-buffer descriptor
+ */
+static void new_wbuf_timer_nolock(struct ubifs_wbuf *wbuf)
+{
+ ubifs_assert(!timer_pending(&wbuf->timer));
+
+ if (!wbuf->timeout)
+ return;
+
+ wbuf->timer.expires = jiffies + wbuf->timeout;
+ add_timer(&wbuf->timer);
+}
+
+/**
+ * cancel_wbuf_timer - cancel write-buffer timer.
+ * @wbuf: write-buffer descriptor
+ */
+static void cancel_wbuf_timer_nolock(struct ubifs_wbuf *wbuf)
+{
+ /*
+ * If the syncer is waiting for the lock (from the background thread's
+ * context) and another task is changing write-buffer then the syncing
+ * should be canceled.
+ */
+ wbuf->need_sync = 0;
+ del_timer(&wbuf->timer);
+}
+
+/**
+ * ubifs_wbuf_sync_nolock - synchronize write-buffer.
+ * @wbuf: write-buffer to synchronize
+ *
+ * This function synchronizes write-buffer @buf and returns zero in case of
+ * success or a negative error code in case of failure.
+ */
+int ubifs_wbuf_sync_nolock(struct ubifs_wbuf *wbuf)
+{
+ struct ubifs_info *c = wbuf->c;
+ int err, dirt;
+
+ cancel_wbuf_timer_nolock(wbuf);
+ if (!wbuf->used || wbuf->lnum == -1)
+ /* Write-buffer is empty or not seeked */
+ return 0;
+
+ dbg_io("LEB %d:%d, %d bytes",
+ wbuf->lnum, wbuf->offs, wbuf->used);
+ ubifs_assert(!(c->vfs_sb->s_flags & MS_RDONLY));
+ ubifs_assert(!(wbuf->avail & 7));
+ ubifs_assert(wbuf->offs + c->min_io_size <= c->leb_size);
+
+ if (c->ro_media)
+ return -EROFS;
+
+ ubifs_pad(c, wbuf->buf + wbuf->used, wbuf->avail);
+ err = ubi_leb_write(c->ubi, wbuf->lnum, wbuf->buf, wbuf->offs,
+ c->min_io_size, wbuf->dtype);
+ if (err) {
+ ubifs_err("cannot write %d bytes to LEB %d:%d",
+ c->min_io_size, wbuf->lnum, wbuf->offs);
+ dbg_dump_stack();
+ return err;
+ }
+
+ dirt = wbuf->avail;
+
+ spin_lock(&wbuf->lock);
+ wbuf->offs += c->min_io_size;
+ wbuf->avail = c->min_io_size;
+ wbuf->used = 0;
+ wbuf->next_ino = 0;
+ spin_unlock(&wbuf->lock);
+
+ if (wbuf->sync_callback)
+ err = wbuf->sync_callback(c, wbuf->lnum,
+ c->leb_size - wbuf->offs, dirt);
+ return err;
+}
+
+/**
+ * ubifs_wbuf_seek_nolock - seek write-buffer.
+ * @wbuf: write-buffer
+ * @lnum: logical eraseblock number to seek to
+ * @offs: logical eraseblock offset to seek to
+ * @dtype: data type
+ *
+ * This function targets the write buffer to logical eraseblock @lnum:@offs.
+ * The write-buffer is synchronized if it is not empty. Returns zero in case of
+ * success and a negative error code in case of failure.
+ */
+int ubifs_wbuf_seek_nolock(struct ubifs_wbuf *wbuf, int lnum, int offs,
+ int dtype)
+{
+ const struct ubifs_info *c = wbuf->c;
+
+ dbg_io("LEB %d:%d", lnum, offs);
+ ubifs_assert(lnum >= 0 && lnum < c->leb_cnt);
+ ubifs_assert(offs >= 0 && offs <= c->leb_size);
+ ubifs_assert(offs % c->min_io_size == 0 && !(offs & 7));
+ ubifs_assert(lnum != wbuf->lnum);
+
+ if (wbuf->used > 0) {
+ int err = ubifs_wbuf_sync_nolock(wbuf);
+
+ if (err)
+ return err;
+ }
+
+ spin_lock(&wbuf->lock);
+ wbuf->lnum = lnum;
+ wbuf->offs = offs;
+ wbuf->avail = c->min_io_size;
+ wbuf->used = 0;
+ spin_unlock(&wbuf->lock);
+ wbuf->dtype = dtype;
+
+ return 0;
+}
+
+/**
+ * ubifs_bg_wbufs_sync - synchronize write-buffers.
+ * @c: UBIFS file-system description object
+ *
+ * This function is called by background thread to synchronize write-buffers.
+ * Returns zero in case of success and a negative error code in case of
+ * failure.
+ */
+int ubifs_bg_wbufs_sync(struct ubifs_info *c)
+{
+ int err, i;
+
+ if (!c->need_wbuf_sync)
+ return 0;
+ c->need_wbuf_sync = 0;
+
+ if (c->ro_media) {
+ err = -EROFS;
+ goto out_timers;
+ }
+
+ dbg_io("synchronize");
+ for (i = 0; i < c->jhead_cnt; i++) {
+ struct ubifs_wbuf *wbuf = &c->jheads[i].wbuf;
+
+ cond_resched();
+
+ /*
+ * If the mutex is locked then wbuf is being changed, so
+ * synchronization is not necessary.
+ */
+ if (mutex_is_locked(&wbuf->io_mutex))
+ continue;
+
+ mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);
+ if (!wbuf->need_sync) {
+ mutex_unlock(&wbuf->io_mutex);
+ continue;
+ }
+
+ err = ubifs_wbuf_sync_nolock(wbuf);
+ mutex_unlock(&wbuf->io_mutex);
+ if (err) {
+ ubifs_err("cannot sync write-buffer, error %d", err);
+ ubifs_ro_mode(c);
+ goto out_timers;
+ }
+ }
+
+ return 0;
+
+out_timers:
+ /* Cancel all timers to prevent repeated errors */
+ for (i = 0; i < c->jhead_cnt; i++) {
+ struct ubifs_wbuf *wbuf = &c->jheads[i].wbuf;
+
+ mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);
+ cancel_wbuf_timer_nolock(wbuf);
+ mutex_unlock(&wbuf->io_mutex);
+ }
+ return err;
+}
+
+/**
+ * ubifs_wbuf_write_nolock - write data to flash via write-buffer.
+ * @wbuf: write-buffer
+ * @buf: node to write
+ * @len: node length
+ *
+ * This function writes data to flash via write-buffer @wbuf. This means that
+ * the last piece of the node won't reach the flash media immediately if it
+ * does not take whole minimal I/O unit. Instead, the node will sit in RAM
+ * until the write-buffer is synchronized (e.g., by timer).
+ *
+ * This function returns zero in case of success and a negative error code in
+ * case of failure. If the node cannot be written because there is no more
+ * space in this logical eraseblock, %-ENOSPC is returned.
+ */
+int ubifs_wbuf_write_nolock(struct ubifs_wbuf *wbuf, void *buf, int len)
+{
+ struct ubifs_info *c = wbuf->c;
+ int err, written, n, aligned_len = ALIGN(len, 8), offs;
+
+ dbg_io("%d bytes (%s) to wbuf at LEB %d:%d", len,
+ dbg_ntype(((struct ubifs_ch *)buf)->node_type), wbuf->lnum,
+ wbuf->offs + wbuf->used);
+ ubifs_assert(len > 0 && wbuf->lnum >= 0 && wbuf->lnum < c->leb_cnt);
+ ubifs_assert(wbuf->offs >= 0 && wbuf->offs % c->min_io_size == 0);
+ ubifs_assert(!(wbuf->offs & 7) && wbuf->offs <= c->leb_size);
+ ubifs_assert(wbuf->avail > 0 && wbuf->avail <= c->min_io_size);
+ ubifs_assert(mutex_is_locked(&wbuf->io_mutex));
+
+ if (c->leb_size - wbuf->offs - wbuf->used < aligned_len) {
+ err = -ENOSPC;
+ goto out;
+ }
+
+ cancel_wbuf_timer_nolock(wbuf);
+
+ if (c->ro_media)
+ return -EROFS;
+
+ if (aligned_len <= wbuf->avail) {
+ /*
+ * The node is not very large and fits entirely within
+ * write-buffer.
+ */
+ memcpy(wbuf->buf + wbuf->used, buf, len);
+
+ if (aligned_len == wbuf->avail) {
+ dbg_io("flush wbuf to LEB %d:%d", wbuf->lnum,
+ wbuf->offs);
+ err = ubi_leb_write(c->ubi, wbuf->lnum, wbuf->buf,
+ wbuf->offs, c->min_io_size,
+ wbuf->dtype);
+ if (err)
+ goto out;
+
+ spin_lock(&wbuf->lock);
+ wbuf->offs += c->min_io_size;
+ wbuf->avail = c->min_io_size;
+ wbuf->used = 0;
+ wbuf->next_ino = 0;
+ spin_unlock(&wbuf->lock);
+ } else {
+ spin_lock(&wbuf->lock);
+ wbuf->avail -= aligned_len;
+ wbuf->used += aligned_len;
+ spin_unlock(&wbuf->lock);
+ }
+
+ goto exit;
+ }
+
+ /*
+ * The node is large enough and does not fit entirely within current
+ * minimal I/O unit. We have to fill and flush write-buffer and switch
+ * to the next min. I/O unit.
+ */
+ dbg_io("flush wbuf to LEB %d:%d", wbuf->lnum, wbuf->offs);
+ memcpy(wbuf->buf + wbuf->used, buf, wbuf->avail);
+ err = ubi_leb_write(c->ubi, wbuf->lnum, wbuf->buf, wbuf->offs,
+ c->min_io_size, wbuf->dtype);
+ if (err)
+ goto out;
+
+ offs = wbuf->offs + c->min_io_size;
+ len -= wbuf->avail;
+ aligned_len -= wbuf->avail;
+ written = wbuf->avail;
+
+ /*
+ * The remaining data may take more whole min. I/O units, so write the
+ * remains multiple to min. I/O unit size directly to the flash media.
+ * We align node length to 8-byte boundary because we anyway flash wbuf
+ * if the remaining space is less then 8 bytes.
+ */
+ n = aligned_len >> c->min_io_shift;
+ if (n) {
+ n <<= c->min_io_shift;
+ dbg_io("write %d bytes to LEB %d:%d", n, wbuf->lnum, offs);
+ err = ubi_leb_write(c->ubi, wbuf->lnum, buf + written, offs, n,
+ wbuf->dtype);
+ if (err)
+ goto out;
+ offs += n;
+ aligned_len -= n;
+ len -= n;
+ written += n;
+ }
+
+ spin_lock(&wbuf->lock);
+ if (aligned_len)
+ /*
+ * And now we have what's left and what does not take whole
+ * min. I/O unit, so write it to the write-buffer and we are
+ * done.
+ */
+ memcpy(wbuf->buf, buf + written, len);
+
+ wbuf->offs = offs;
+ wbuf->used = aligned_len;
+ wbuf->avail = c->min_io_size - aligned_len;
+ wbuf->next_ino = 0;
+ spin_unlock(&wbuf->lock);
+
+exit:
+ if (wbuf->sync_callback) {
+ int free = c->leb_size - wbuf->offs - wbuf->used;
+
+ err = wbuf->sync_callback(c, wbuf->lnum, free, 0);
+ if (err)
+ goto out;
+ }
+
+ if (wbuf->used)
+ new_wbuf_timer_nolock(wbuf);
+
+ return 0;
+
+out:
+ ubifs_err("cannot write %d bytes to LEB %d:%d, error %d",
+ len, wbuf->lnum, wbuf->offs, err);
+ dbg_dump_node(c, buf);
+ dbg_dump_stack();
+ dbg_dump_leb(c, wbuf->lnum);
+ return err;
+}
+
+/**
+ * ubifs_write_node - write node to the media.
+ * @c: UBIFS file-system description object
+ * @buf: the node to write
+ * @len: node length
+ * @lnum: logical eraseblock number
+ * @offs: offset within the logical eraseblock
+ * @dtype: node life-time hint (%UBI_LONGTERM, %UBI_SHORTTERM, %UBI_UNKNOWN)
+ *
+ * This function automatically fills node magic number, assigns sequence
+ * number, and calculates node CRC checksum. The length of the @buf buffer has
+ * to be aligned to the minimal I/O unit size. This function automatically
+ * appends padding node and padding bytes if needed. Returns zero in case of
+ * success and a negative error code in case of failure.
+ */
+int ubifs_write_node(struct ubifs_info *c, void *buf, int len, int lnum,
+ int offs, int dtype)
+{
+ int err, buf_len = ALIGN(len, c->min_io_size);
+
+ dbg_io("LEB %d:%d, %s, length %d (aligned %d)",
+ lnum, offs, dbg_ntype(((struct ubifs_ch *)buf)->node_type), len,
+ buf_len);
+ ubifs_assert(lnum >= 0 && lnum < c->leb_cnt && offs >= 0);
+ ubifs_assert(offs % c->min_io_size == 0 && offs < c->leb_size);
+
+ if (c->ro_media)
+ return -EROFS;
+
+ ubifs_prepare_node(c, buf, len, 1);
+ err = ubi_leb_write(c->ubi, lnum, buf, offs, buf_len, dtype);
+ if (err) {
+ ubifs_err("cannot write %d bytes to LEB %d:%d, error %d",
+ buf_len, lnum, offs, err);
+ dbg_dump_node(c, buf);
+ dbg_dump_stack();
+ }
+
+ return err;
+}
+
+/**
+ * ubifs_read_node_wbuf - read node from the media or write-buffer.
+ * @wbuf: wbuf to check for un-written data
+ * @buf: buffer to read to
+ * @type: node type
+ * @len: node length
+ * @lnum: logical eraseblock number
+ * @offs: offset within the logical eraseblock
+ *
+ * This function reads a node of known type and length, checks it and stores
+ * in @buf. If the node partially or fully sits in the write-buffer, this
+ * function takes data from the buffer, otherwise it reads the flash media.
+ * Returns zero in case of success, %-EUCLEAN if CRC mismatched and a negative
+ * error code in case of failure.
+ */
+int ubifs_read_node_wbuf(struct ubifs_wbuf *wbuf, void *buf, int type, int len,
+ int lnum, int offs)
+{
+ const struct ubifs_info *c = wbuf->c;
+ int err, rlen, overlap;
+ struct ubifs_ch *ch = buf;
+
+ dbg_io("LEB %d:%d, %s, length %d", lnum, offs, dbg_ntype(type), len);
+ ubifs_assert(wbuf && lnum >= 0 && lnum < c->leb_cnt && offs >= 0);
+ ubifs_assert(!(offs & 7) && offs < c->leb_size);
+ ubifs_assert(type >= 0 && type < UBIFS_NODE_TYPES_CNT);
+
+ spin_lock(&wbuf->lock);
+ overlap = (lnum == wbuf->lnum && offs + len > wbuf->offs);
+ if (!overlap) {
+ /* We may safely unlock the write-buffer and read the data */
+ spin_unlock(&wbuf->lock);
+ return ubifs_read_node(c, buf, type, len, lnum, offs);
+ }
+
+ /* Don't read under wbuf */
+ rlen = wbuf->offs - offs;
+ if (rlen < 0)
+ rlen = 0;
+
+ /* Copy the rest from the write-buffer */
+ memcpy(buf + rlen, wbuf->buf + offs + rlen - wbuf->offs, len - rlen);
+ spin_unlock(&wbuf->lock);
+
+ if (rlen > 0) {
+ /* Read everything that goes before write-buffer */
+ err = ubi_read(c->ubi, lnum, buf, offs, rlen);
+ if (err && err != -EBADMSG) {
+ ubifs_err("failed to read node %d from LEB %d:%d, "
+ "error %d", type, lnum, offs, err);
+ dbg_dump_stack();
+ return err;
+ }
+ }
+
+ err = ubifs_check_node(c, buf, lnum, offs, 0);
+ if (err) {
+ ubifs_err("expected node type %d", type);
+ return err;
+ }
+
+ if (type != ch->node_type) {
+ ubifs_err("bad node type (%d but expected %d)",
+ ch->node_type, type);
+ goto out;
+ }
+
+ rlen = le32_to_cpu(ch->len);
+ if (rlen != len) {
+ ubifs_err("bad node length %d, expected %d", rlen, len);
+ goto out;
+ }
+
+ return 0;
+
+out:
+ ubifs_err("bad node at LEB %d:%d", lnum, offs);
+ dbg_dump_node(c, buf);
+ dbg_dump_stack();
+ return -EINVAL;
+}
+
+/**
+ * ubifs_read_node - read node.
+ * @c: UBIFS file-system description object
+ * @buf: buffer to read to
+ * @type: node type
+ * @len: node length (not aligned)
+ * @lnum: logical eraseblock number
+ * @offs: offset within the logical eraseblock
+ *
+ * This function reads a node of known type and and length, checks it and
+ * stores in @buf. Returns zero in case of success, %-EUCLEAN if CRC mismatched
+ * and a negative error code in case of failure.
+ */
+int ubifs_read_node(const struct ubifs_info *c, void *buf, int type, int len,
+ int lnum, int offs)
+{
+ int err, l;
+ struct ubifs_ch *ch = buf;
+
+ dbg_io("LEB %d:%d, %s, length %d", lnum, offs, dbg_ntype(type), len);
+ ubifs_assert(lnum >= 0 && lnum < c->leb_cnt && offs >= 0);
+ ubifs_assert(len >= UBIFS_CH_SZ && offs + len <= c->leb_size);
+ ubifs_assert(!(offs & 7) && offs < c->leb_size);
+ ubifs_assert(type >= 0 && type < UBIFS_NODE_TYPES_CNT);
+
+ err = ubi_read(c->ubi, lnum, buf, offs, len);
+ if (err && err != -EBADMSG) {
+ ubifs_err("cannot read node %d from LEB %d:%d, error %d",
+ type, lnum, offs, err);
+ return err;
+ }
+
+ err = ubifs_check_node(c, buf, lnum, offs, 0);
+ if (err) {
+ ubifs_err("expected node type %d", type);
+ return err;
+ }
+
+ if (type != ch->node_type) {
+ ubifs_err("bad node type (%d but expected %d)",
+ ch->node_type, type);
+ goto out;
+ }
+
+ l = le32_to_cpu(ch->len);
+ if (l != len) {
+ ubifs_err("bad node length %d, expected %d", l, len);
+ goto out;
+ }
+
+ return 0;
+
+out:
+ ubifs_err("bad node at LEB %d:%d", lnum, offs);
+ dbg_dump_node(c, buf);
+ dbg_dump_stack();
+ return -EINVAL;
+}
+
+/**
+ * ubifs_wbuf_init - initialize write-buffer.
+ * @c: UBIFS file-system description object
+ * @wbuf: write-buffer to initialize
+ *
+ * This function initializes write buffer. Returns zero in case of success
+ * %-ENOMEM in case of failure.
+ */
+int ubifs_wbuf_init(struct ubifs_info *c, struct ubifs_wbuf *wbuf)
+{
+ size_t size;
+
+ wbuf->buf = kmalloc(c->min_io_size, GFP_KERNEL);
+ if (!wbuf->buf)
+ return -ENOMEM;
+
+ size = (c->min_io_size / UBIFS_CH_SZ + 1) * sizeof(ino_t);
+ wbuf->inodes = kmalloc(size, GFP_KERNEL);
+ if (!wbuf->inodes) {
+ kfree(wbuf->buf);
+ wbuf->buf = NULL;
+ return -ENOMEM;
+ }
+
+ wbuf->used = 0;
+ wbuf->lnum = wbuf->offs = -1;
+ wbuf->avail = c->min_io_size;
+ wbuf->dtype = UBI_UNKNOWN;
+ wbuf->sync_callback = NULL;
+ mutex_init(&wbuf->io_mutex);
+ spin_lock_init(&wbuf->lock);
+
+ wbuf->c = c;
+ init_timer(&wbuf->timer);
+ wbuf->timer.function = wbuf_timer_callback_nolock;
+ wbuf->timer.data = (unsigned long)wbuf;
+ wbuf->timeout = DEFAULT_WBUF_TIMEOUT;
+ wbuf->next_ino = 0;
+
+ return 0;
+}
+
+/**
+ * ubifs_wbuf_add_ino_nolock - add an inode number into the wbuf inode array.
+ * @wbuf: the write-buffer whereto add
+ * @inum: the inode number
+ *
+ * This function adds an inode number to the inode array of the write-buffer.
+ */
+void ubifs_wbuf_add_ino_nolock(struct ubifs_wbuf *wbuf, ino_t inum)
+{
+ if (!wbuf->buf)
+ /* NOR flash or something similar */
+ return;
+
+ spin_lock(&wbuf->lock);
+ if (wbuf->used)
+ wbuf->inodes[wbuf->next_ino++] = inum;
+ spin_unlock(&wbuf->lock);
+}
+
+/**
+ * wbuf_has_ino - returns if the wbuf contains data from the inode.
+ * @wbuf: the write-buffer
+ * @inum: the inode number
+ *
+ * This function returns with %1 if the write-buffer contains some data from the
+ * given inode otherwise it returns with %0.
+ */
+static int wbuf_has_ino(struct ubifs_wbuf *wbuf, ino_t inum)
+{
+ int i, ret = 0;
+
+ spin_lock(&wbuf->lock);
+ for (i = 0; i < wbuf->next_ino; i++)
+ if (inum == wbuf->inodes[i]) {
+ ret = 1;
+ break;
+ }
+ spin_unlock(&wbuf->lock);
+
+ return ret;
+}
+
+/**
+ * ubifs_sync_wbufs_by_inodes - synchronize write-buffers which have data.
+ * belonging to specified inodes.
+ * @c: UBIFS file-system description object
+ * @inodes: array of inodes
+ * @count: number of elements in @inodes
+ *
+ * This function synchronizes write-buffers which contain nodes belonging to
+ * any inode specified in @inodes array. Returns zero in case of success and a
+ * negative error code in case of failure.
+ */
+int ubifs_sync_wbufs_by_inodes(struct ubifs_info *c,
+ struct inode * const *inodes, int count)
+{
+ int i, j, err = 0;
+
+ ubifs_assert(count);
+
+ for (i = 0; i < c->jhead_cnt; i++) {
+ struct ubifs_wbuf *wbuf = &c->jheads[i].wbuf;
+
+ if (i == GCHD)
+ /*
+ * GC head is special, do not look at it. Even if the
+ * head contains something related to this inode, it is
+ * a _copy_ of corresponding on-flash node which sits
+ * somewhere else.
+ */
+ continue;
+
+ for (j = 0; j < count && !err; j++)
+ if (wbuf_has_ino(wbuf, inodes[j]->i_ino)) {
+ mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);
+ if (wbuf_has_ino(wbuf, inodes[j]->i_ino))
+ err = ubifs_wbuf_sync_nolock(wbuf);
+ mutex_unlock(&wbuf->io_mutex);
+ break;
+ }
+
+ if (err) {
+ ubifs_ro_mode(c);
+ break;
+ }
+ }
+
+ return err;
+}
--
1.5.4.1

2008-03-27 13:14:20

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 13/26] UBIFS: add TNC commit implementation

We commit the TNC from time to time, which means we update the on-flash
indexing tree. The TNC commit basically implements journal commit.
The UBIFS implementation allows writing while the commit is in progress.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/tnc_commit.c | 1088 +++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 1088 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/tnc_commit.c b/fs/ubifs/tnc_commit.c
new file mode 100644
index 0000000..bc0ce2c
--- /dev/null
+++ b/fs/ubifs/tnc_commit.c
@@ -0,0 +1,1088 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/* This file implements TNC functions for committing */
+
+#include "ubifs.h"
+
+/**
+ * make_idx_node - make an index node for fill-the-gaps method of TNC commit.
+ * @c: UBIFS file-system description object
+ * @idx: buffer in which to place new index node
+ * @znode: znode from which to make new index node
+ * @lnum: LEB number where new index node will be written
+ * @offs: offset where new index node will be written
+ * @len: length of new index node
+ */
+static int make_idx_node(struct ubifs_info *c, struct ubifs_idx_node *idx,
+ struct ubifs_znode *znode, int lnum, int offs, int len)
+{
+ struct ubifs_znode *zp;
+ int i, err;
+
+ /* Make index node */
+ idx->ch.node_type = UBIFS_IDX_NODE;
+ idx->child_cnt = cpu_to_le16(znode->child_cnt);
+ idx->level = cpu_to_le16(znode->level);
+ for (i = 0; i < znode->child_cnt; i++) {
+ struct ubifs_branch *br = ubifs_idx_branch(c, idx, i);
+ struct ubifs_zbranch *zbr = &znode->zbranch[i];
+
+ key_write_idx(c, &zbr->key, &br->key);
+ br->lnum = cpu_to_le32(zbr->lnum);
+ br->offs = cpu_to_le32(zbr->offs);
+ br->len = cpu_to_le32(zbr->len);
+ if (!zbr->lnum || !zbr->len) {
+ ubifs_err("bad ref in znode");
+ dbg_dump_znode(c, znode);
+ if (zbr->znode)
+ dbg_dump_znode(c, zbr->znode);
+ }
+ }
+ ubifs_prepare_node(c, idx, len, 0);
+
+#ifdef CONFIG_UBIFS_FS_DEBUG
+ znode->lnum = lnum;
+ znode->offs = offs;
+ znode->len = len;
+#endif
+
+ err = insert_old_idx_znode(c, znode);
+
+ /* Update the parent */
+ zp = znode->parent;
+ if (zp) {
+ struct ubifs_zbranch *zbr;
+
+ zbr = &zp->zbranch[znode->iip];
+ zbr->lnum = lnum;
+ zbr->offs = offs;
+ zbr->len = len;
+ } else {
+ c->zroot.lnum = lnum;
+ c->zroot.offs = offs;
+ c->zroot.len = len;
+ }
+ c->calc_idx_sz += ALIGN(len, 8);
+
+ atomic_long_dec(&c->dirty_zn_cnt);
+
+ ubifs_assert(ubifs_zn_dirty(znode));
+ ubifs_assert(test_bit(COW_ZNODE, &znode->flags));
+
+ clear_bit(DIRTY_ZNODE, &znode->flags);
+ clear_bit(COW_ZNODE, &znode->flags);
+
+ return err;
+}
+
+/**
+ * fill_gap - make index nodes in gaps in dirty index LEBs.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number that gap appears in
+ * @gap_start: offset of start of gap
+ * @gap_end: offset of end of gap
+ * @dirt: adds dirty space to this
+ *
+ * This function returns the number of index nodes written into the gap.
+ */
+static int fill_gap(struct ubifs_info *c, int lnum, int gap_start, int gap_end,
+ int *dirt)
+{
+ int len, gap_remains, gap_pos, written, pad_len;
+
+ ubifs_assert((gap_start & 7) == 0);
+ ubifs_assert((gap_end & 7) == 0);
+ ubifs_assert(gap_end >= gap_start);
+
+ gap_remains = gap_end - gap_start;
+ if (!gap_remains)
+ return 0;
+ gap_pos = gap_start;
+ written = 0;
+ while (c->enext) {
+ len = ubifs_idx_node_sz(c, c->enext->child_cnt);
+ if (len < gap_remains) {
+ struct ubifs_znode *znode = c->enext;
+ const int alen = ALIGN(len, 8);
+ int err;
+
+ ubifs_assert(alen <= gap_remains);
+ err = make_idx_node(c, c->ileb_buf + gap_pos, znode,
+ lnum, gap_pos, len);
+ if (err)
+ return err;
+ gap_remains -= alen;
+ gap_pos += alen;
+ c->enext = znode->cnext;
+ if (c->enext == c->cnext)
+ c->enext = NULL;
+ written += 1;
+ } else
+ break;
+ }
+ if (gap_end == c->leb_size) {
+ c->ileb_len = ALIGN(gap_pos, c->min_io_size);
+ /* Pad to end of min_io_size */
+ pad_len = c->ileb_len - gap_pos;
+ } else
+ /* Pad to end of gap */
+ pad_len = gap_remains;
+ dbg_gc("LEB %d:%d to %d len %d nodes written %d wasted bytes %d",
+ lnum, gap_start, gap_end, gap_end - gap_start, written, pad_len);
+ ubifs_pad(c, c->ileb_buf + gap_pos, pad_len);
+ *dirt += pad_len;
+ return written;
+}
+
+/**
+ * find_old_idx - find an index node obsoleted since the last commit start.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number of obsoleted index node
+ * @offs: offset of obsoleted index node
+ *
+ * Returns %1 if found and %0 otherwise.
+ */
+static int find_old_idx(struct ubifs_info *c, int lnum, int offs)
+{
+ struct ubifs_old_idx *o;
+ struct rb_node *p;
+
+ p = c->old_idx.rb_node;
+ while (p) {
+ o = rb_entry(p, struct ubifs_old_idx, rb);
+ if (lnum < o->lnum)
+ p = p->rb_left;
+ else if (lnum > o->lnum)
+ p = p->rb_right;
+ else if (offs < o->offs)
+ p = p->rb_left;
+ else if (offs > o->offs)
+ p = p->rb_right;
+ else
+ return 1;
+ }
+ return 0;
+}
+
+/**
+ * is_idx_node_in_use - determine if an index node can be overwritten.
+ * @c: UBIFS file-system description object
+ * @key: key of index node
+ * @level: index node level
+ * @lnum: LEB number of index node
+ * @offs: offset of index node
+ *
+ * If @key / @lnum / @offs identify an index node that was not part of the old
+ * index, then this function returns %0 (obsolete). Else if the index node was
+ * part of the old index but is now dirty %1 is returned, else if it is clean %2
+ * is returned. A negative error code is returned on failure.
+ */
+static int is_idx_node_in_use(struct ubifs_info *c, union ubifs_key *key,
+ int level, int lnum, int offs)
+{
+ int ret;
+
+ ret = is_idx_node_in_tnc(c, key, level, lnum, offs);
+ if (ret < 0)
+ return ret; /* Error code */
+ if (ret == 0)
+ if (find_old_idx(c, lnum, offs))
+ return 1;
+ return ret;
+}
+
+/**
+ * layout_leb_in_gaps - layout index nodes using in-the-gaps method.
+ * @c: UBIFS file-system description object
+ * @p: return LEB number here
+ *
+ * This function lays out new index nodes for dirty znodes using in-the-gaps
+ * method of TNC commit.
+ * This function merely puts the next znode into the next gap, making no attempt
+ * to try to maximise the number of znodes that fit.
+ * This function returns the number of index nodes written into the gaps, or a
+ * negative error code on failure.
+ */
+static int layout_leb_in_gaps(struct ubifs_info *c, int *p)
+{
+ struct ubifs_scan_leb *sleb;
+ struct ubifs_scan_node *snod;
+ int lnum, dirt = 0, gap_start, gap_end, err, written, tot_written;
+
+ tot_written = 0;
+ /* Get an index LEB with lots of obsolete index nodes */
+ lnum = ubifs_find_dirty_idx_leb(c);
+ if (lnum < 0)
+ /*
+ * There also may be dirt in the index head that could be
+ * filled, however we do not check there at present.
+ */
+ return lnum; /* Error code */
+ *p = lnum;
+ dbg_gc("LEB %d", lnum);
+ /*
+ * Scan the index LEB. We use the generic scan for this even though
+ * it is more comprehensive and less efficient than is needed for this
+ * purpose.
+ */
+ sleb = ubifs_scan(c, lnum, 0, c->ileb_buf);
+ c->ileb_len = 0;
+ if (IS_ERR(sleb))
+ return PTR_ERR(sleb);
+ gap_start = 0;
+ list_for_each_entry(snod, &sleb->nodes, list) {
+ struct ubifs_idx_node *idx;
+ int in_use, level;
+
+ ubifs_assert(snod->type == UBIFS_IDX_NODE);
+ idx = snod->node;
+ key_read(c, ubifs_idx_key(c, idx), &snod->key);
+ level = le16_to_cpu(idx->level);
+ /* Determine if the index node is in use (not obsolete) */
+ in_use = is_idx_node_in_use(c, &snod->key, level, lnum,
+ snod->offs);
+ if (in_use < 0) {
+ ubifs_scan_destroy(sleb);
+ return in_use; /* Error code */
+ }
+ if (in_use) {
+ if (in_use == 1)
+ dirt += ALIGN(snod->len, 8);
+ /*
+ * The obsolete index nodes form gaps that can be
+ * overwritten. This gap has ended because we have
+ * found an index node that is still in use
+ * i.e. not obsolete
+ */
+ gap_end = snod->offs;
+ /* Try to fill gap */
+ written = fill_gap(c, lnum, gap_start, gap_end, &dirt);
+ if (written < 0) {
+ ubifs_scan_destroy(sleb);
+ return written; /* Error code */
+ }
+ tot_written += written;
+ gap_start = ALIGN(snod->offs + snod->len, 8);
+ }
+ }
+ ubifs_scan_destroy(sleb);
+ c->ileb_len = c->leb_size;
+ gap_end = c->leb_size;
+ /* Try to fill gap */
+ written = fill_gap(c, lnum, gap_start, gap_end, &dirt);
+ if (written < 0)
+ return written; /* Error code */
+ tot_written += written;
+ if (tot_written == 0) {
+ struct ubifs_lprops lp;
+
+ dbg_gc("LEB %d wrote %d index nodes", lnum, tot_written);
+ err = ubifs_read_one_lp(c, lnum, &lp);
+ if (err)
+ return err;
+ if (lp.free == c->leb_size) {
+ /*
+ * We must have snatched this LEB from the idx_gc list
+ * so we need to correct the free and dirty space.
+ */
+ err = ubifs_change_one_lp(c, lnum,
+ c->leb_size - c->ileb_len,
+ dirt, 0, 0, 0);
+ if (err)
+ return err;
+ }
+ return 0;
+ }
+ err = ubifs_change_one_lp(c, lnum, c->leb_size - c->ileb_len, dirt,
+ 0, 0, 0);
+ if (err)
+ return err;
+ err = ubi_leb_change(c->ubi, lnum, c->ileb_buf, c->ileb_len,
+ UBI_SHORTTERM);
+ if (err) {
+ ubifs_err("ubi_leb_change failed, error %d", err);
+ return err;
+ }
+ dbg_gc("LEB %d wrote %d index nodes", lnum, tot_written);
+ return tot_written;
+}
+
+/**
+ * get_leb_cnt - calculate the number of empty LEBs needed to commit.
+ * @c: UBIFS file-system description object
+ * @cnt: number of znodes to commit
+ *
+ * This function returns the number of empty LEBs needed to commit @cnt znodes
+ * to the current index head. The number is not exact and may be more than
+ * needed.
+ */
+static int get_leb_cnt(struct ubifs_info *c, int cnt)
+{
+ int d;
+
+ /* Assume maximum index node size (i.e. overestimate space needed) */
+ cnt -= (c->leb_size - c->ihead_offs) / c->max_idx_node_sz;
+ if (cnt < 0)
+ cnt = 0;
+ d = c->leb_size / c->max_idx_node_sz;
+ return DIV_ROUND_UP(cnt, d);
+}
+
+/**
+ * layout_in_gaps - in-the-gaps method of committing TNC.
+ * @c: UBIFS file-system description object
+ * @cnt: number of dirty znodes to commit.
+ *
+ * This function lays out new index nodes for dirty znodes using in-the-gaps
+ * method of TNC commit.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int layout_in_gaps(struct ubifs_info *c, int cnt)
+{
+ int err, leb_needed_cnt, written, *p;
+
+ dbg_gc("%d znodes to write", cnt);
+
+ c->gap_lebs = kmalloc(sizeof(int) * (c->lst.idx_lebs + 1), GFP_NOFS);
+ if (!c->gap_lebs)
+ return -ENOMEM;
+
+ p = c->gap_lebs;
+ do {
+ ubifs_assert(p < c->gap_lebs + sizeof(int) * c->lst.idx_lebs);
+ written = layout_leb_in_gaps(c, p);
+ if (written < 0) {
+ err = written;
+ if (err == -ENOSPC) {
+ ubifs_err("out of space");
+ spin_lock(&c->space_lock);
+ dbg_dump_budg(c);
+ spin_unlock(&c->space_lock);
+ dbg_dump_lprops(c);
+ /* Try to commit anyway */
+ err = 0;
+ break;
+ }
+ kfree(c->gap_lebs);
+ c->gap_lebs = NULL;
+ return err;
+ }
+ p++;
+ cnt -= written;
+ leb_needed_cnt = get_leb_cnt(c, cnt);
+ dbg_gc("%d znodes remaining, need %d LEBs, have %d", cnt,
+ leb_needed_cnt, c->ileb_cnt);
+ } while (leb_needed_cnt > c->ileb_cnt);
+
+ *p = -1;
+ return 0;
+}
+
+/**
+ * layout_in_empty_space - layout index nodes in empty space.
+ * @c: UBIFS file-system description object
+ *
+ * This function lays out new index nodes for dirty znodes using empty LEBs.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int layout_in_empty_space(struct ubifs_info *c)
+{
+ struct ubifs_znode *znode, *cnext, *zp;
+ int lnum, offs, len, next_len, buf_len, buf_offs, used, avail;
+ int wlen, blen, err;
+
+ cnext = c->enext;
+ if (cnext == NULL)
+ return 0;
+
+ lnum = c->ihead_lnum;
+ buf_offs = c->ihead_offs;
+
+ buf_len = ubifs_idx_node_sz(c, c->fanout);
+ buf_len = ALIGN(buf_len, c->min_io_size);
+ used = 0;
+ avail = buf_len;
+
+ /* Ensure there is enough room for first write */
+ next_len = ubifs_idx_node_sz(c, cnext->child_cnt);
+ if (buf_offs + next_len > c->leb_size)
+ lnum = -1;
+
+ while (1) {
+ znode = cnext;
+
+ len = ubifs_idx_node_sz(c, znode->child_cnt);
+
+ /* Determine the index node position */
+ if (lnum == -1) {
+ if (c->ileb_nxt >= c->ileb_cnt) {
+ ubifs_err("out of space");
+ return -ENOSPC;
+ }
+ lnum = c->ilebs[c->ileb_nxt++];
+ buf_offs = 0;
+ used = 0;
+ avail = buf_len;
+ }
+
+ offs = buf_offs + used;
+
+#ifdef CONFIG_UBIFS_FS_DEBUG
+ znode->lnum = lnum;
+ znode->offs = offs;
+ znode->len = len;
+#endif
+
+ /* Update the parent */
+ zp = znode->parent;
+ if (zp) {
+ struct ubifs_zbranch *zbr;
+ int i;
+
+ i = znode->iip;
+ zbr = &zp->zbranch[i];
+ zbr->lnum = lnum;
+ zbr->offs = offs;
+ zbr->len = len;
+ } else {
+ c->zroot.lnum = lnum;
+ c->zroot.offs = offs;
+ c->zroot.len = len;
+ }
+ c->calc_idx_sz += ALIGN(len, 8);
+
+ /*
+ * Once lprops is updated, we can decrease the dirty znode count
+ * but it is easier to just do it here.
+ */
+ atomic_long_dec(&c->dirty_zn_cnt);
+
+ /*
+ * Calculate the next index node length to see if there is
+ * enough room for it
+ */
+ cnext = znode->cnext;
+ if (cnext == c->cnext)
+ next_len = 0;
+ else
+ next_len = ubifs_idx_node_sz(c, cnext->child_cnt);
+
+ if (c->min_io_size == 1) {
+ buf_offs += ALIGN(len, 8);
+ if (next_len) {
+ if (buf_offs + next_len <= c->leb_size)
+ continue;
+ err = ubifs_update_one_lp(c, lnum, 0,
+ c->leb_size - buf_offs, 0, 0);
+ if (err)
+ return err;
+ lnum = -1;
+ continue;
+ }
+ err = ubifs_update_one_lp(c, lnum,
+ c->leb_size - buf_offs, 0, 0, 0);
+ if (err)
+ return err;
+ break;
+ }
+
+ /* Update buffer positions */
+ wlen = used + len;
+ used += ALIGN(len, 8);
+ avail -= ALIGN(len, 8);
+
+ if (next_len != 0 &&
+ buf_offs + used + next_len <= c->leb_size &&
+ avail > 0)
+ continue;
+
+ if (avail <= 0 && next_len &&
+ buf_offs + used + next_len <= c->leb_size)
+ blen = buf_len;
+ else
+ blen = ALIGN(wlen, c->min_io_size);
+
+ /* The buffer is full or there are no more znodes to do */
+ buf_offs += blen;
+ if (next_len) {
+ if (buf_offs + next_len > c->leb_size) {
+ err = ubifs_update_one_lp(c, lnum,
+ c->leb_size - buf_offs, blen - used,
+ 0, 0);
+ if (err)
+ return err;
+ lnum = -1;
+ }
+ used -= blen;
+ if (used < 0)
+ used = 0;
+ avail = buf_len - used;
+ continue;
+ }
+ err = ubifs_update_one_lp(c, lnum, c->leb_size - buf_offs,
+ blen - used, 0, 0);
+ if (err)
+ return err;
+ break;
+ }
+
+#ifdef CONFIG_UBIFS_FS_DEBUG
+ c->new_ihead_lnum = lnum;
+ c->new_ihead_offs = buf_offs;
+#endif
+
+ return 0;
+}
+
+/**
+ * layout_commit - determine positions of index nodes to commit.
+ * @c: UBIFS file-system description object
+ * @no_space: indicates that insufficient empty LEBs were allocated
+ * @cnt: number of znodes to commit
+ *
+ * Calculate and update the positions of index nodes to commit. If there were
+ * an insufficient number of empty LEBs allocated, then index nodes are placed
+ * into the gaps created by obsolete index nodes in non-empty index LEBs. For
+ * this purpose, an obsolete index node is one that was not in the index as at
+ * the end of the last commit. To write "in-the-gaps" requires that those index
+ * LEBs are updated atomically in-place.
+ */
+static int layout_commit(struct ubifs_info *c, int no_space, int cnt)
+{
+ int err;
+
+ if (no_space) {
+ err = layout_in_gaps(c, cnt);
+ if (err)
+ return err;
+ }
+ err = layout_in_empty_space(c);
+ return err;
+}
+
+/**
+ * find_first_dirty - find first dirty znode.
+ * @znode: znode to begin searching from
+ */
+static struct ubifs_znode *find_first_dirty(struct ubifs_znode *znode)
+{
+ int i, cont;
+
+ if (!znode)
+ return NULL;
+
+ while (1) {
+ if (znode->level == 0) {
+ if (ubifs_zn_dirty(znode))
+ return znode;
+ return NULL;
+ }
+ cont = 0;
+ for (i = 0; i < znode->child_cnt; i++) {
+ struct ubifs_zbranch *zbr = &znode->zbranch[i];
+
+ if (zbr->znode && ubifs_zn_dirty(zbr->znode)) {
+ znode = zbr->znode;
+ cont = 1;
+ break;
+ }
+ }
+ if (!cont) {
+ if (ubifs_zn_dirty(znode))
+ return znode;
+ return NULL;
+ }
+ }
+}
+
+/**
+ * find_next_dirty - find next dirty znode.
+ * @znode: znode to begin searching from
+ */
+static struct ubifs_znode *find_next_dirty(struct ubifs_znode *znode)
+{
+ int n = znode->iip + 1;
+
+ znode = znode->parent;
+ if (!znode)
+ return NULL;
+ for (; n < znode->child_cnt; n++) {
+ struct ubifs_zbranch *zbr = &znode->zbranch[n];
+
+ if (zbr->znode && ubifs_zn_dirty(zbr->znode))
+ return find_first_dirty(zbr->znode);
+ }
+ return znode;
+}
+
+/**
+ * get_znodes_to_commit - create list of dirty znodes to commit.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns the number of znodes to commit.
+ */
+static int get_znodes_to_commit(struct ubifs_info *c)
+{
+ struct ubifs_znode *znode, *cnext;
+ int cnt = 0;
+
+ c->cnext = find_first_dirty(c->zroot.znode);
+ znode = c->enext = c->cnext;
+ if (!znode) {
+ dbg_cmt("no znodes to commit");
+ return 0;
+ }
+ cnt += 1;
+ while (1) {
+ ubifs_assert(!test_bit(COW_ZNODE, &znode->flags));
+ set_bit(COW_ZNODE, &znode->flags);
+ znode->alt = 0;
+ cnext = find_next_dirty(znode);
+ if (!cnext) {
+ znode->cnext = c->cnext;
+ break;
+ }
+ znode->cnext = cnext;
+ znode = cnext;
+ cnt += 1;
+ }
+ dbg_cmt("committing %d znodes", cnt);
+ ubifs_assert(cnt == atomic_long_read(&c->dirty_zn_cnt));
+ return cnt;
+}
+
+/**
+ * alloc_idx_lebs - allocate empty LEBs to be used to commit.
+ * @c: UBIFS file-system description object
+ * @cnt: number of znodes to commit
+ *
+ * This function returns %-ENOSPC if it cannot allocate a sufficient number of
+ * empty LEBs. %0 is returned on success, otherwise a negative error code
+ * is returned.
+ */
+static int alloc_idx_lebs(struct ubifs_info *c, int cnt)
+{
+ int i, leb_cnt, lnum;
+
+ c->ileb_cnt = 0;
+ c->ileb_nxt = 0;
+ leb_cnt = get_leb_cnt(c, cnt);
+ dbg_cmt("need about %d empty LEBS for TNC commit", leb_cnt);
+ if (!leb_cnt)
+ return 0;
+ c->ilebs = kmalloc(leb_cnt * sizeof(int), GFP_NOFS);
+ if (!c->ilebs)
+ return -ENOMEM;
+ for (i = 0; i < leb_cnt; i++) {
+ lnum = ubifs_find_free_leb_for_idx(c);
+ if (lnum < 0)
+ return lnum;
+ c->ilebs[c->ileb_cnt++] = lnum;
+ dbg_cmt("LEB %d", lnum);
+ }
+ return 0;
+}
+
+/**
+ * free_unused_idx_lebs - free unused LEBs that were allocated for the commit.
+ * @c: UBIFS file-system description object
+ *
+ * It is possible that we allocate more empty LEBs for the commit than we need.
+ * This functions frees the surplus.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int free_unused_idx_lebs(struct ubifs_info *c)
+{
+ int i, err = 0, lnum, er;
+
+ for (i = c->ileb_nxt; i < c->ileb_cnt; i++) {
+ lnum = c->ilebs[i];
+ dbg_cmt("LEB %d", lnum);
+ er = ubifs_change_one_lp(c, lnum, -1, -1, 0,
+ LPROPS_INDEX | LPROPS_TAKEN, 0);
+ if (!err)
+ err = er;
+ }
+ return err;
+}
+
+/**
+ * free_idx_lebs - free unused LEBs after commit end.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int free_idx_lebs(struct ubifs_info *c)
+{
+ int err;
+
+ err = free_unused_idx_lebs(c);
+ kfree(c->ilebs);
+ c->ilebs = NULL;
+ return err;
+}
+
+/**
+ * ubifs_tnc_start_commit - start TNC commit.
+ * @c: UBIFS file-system description object
+ * @zroot: new index root position is returned here
+ *
+ * This function prepares the list of indexing nodes to commit and lays out
+ * their positions on flash. If there is not enough free space it uses the
+ * in-gap commit method. Returns zero in case of success and a negative error
+ * code in case of failure.
+ */
+int ubifs_tnc_start_commit(struct ubifs_info *c, struct ubifs_zbranch *zroot)
+{
+ int err = 0, cnt;
+
+ mutex_lock(&c->tnc_mutex);
+ err = dbg_check_tnc(c, 1);
+ if (err)
+ goto out;
+ cnt = get_znodes_to_commit(c);
+ if (cnt != 0) {
+ int no_space = 0;
+
+ err = alloc_idx_lebs(c, cnt);
+ if (err == -ENOSPC)
+ no_space = 1;
+ else if (err)
+ goto out_free;
+ err = layout_commit(c, no_space, cnt);
+ if (err)
+ goto out_free;
+ ubifs_assert(atomic_long_read(&c->dirty_zn_cnt) == 0);
+ err = free_unused_idx_lebs(c);
+ if (err)
+ goto out;
+ }
+ destroy_old_idx(c);
+ memcpy(zroot, &c->zroot, sizeof(struct ubifs_zbranch));
+
+ err = ubifs_save_dirty_idx_lnums(c);
+ if (err)
+ goto out;
+
+ spin_lock(&c->space_lock);
+ /*
+ * Although we have not finished committing yet, update size of the
+ * committed index ('c->old_idx_sz') and zero out the index growth
+ * budget. It is OK to do this now, because we've reserved all the
+ * space which is needed to commit the index, and it is save for the
+ * budgeting subsystem to assume the index is already committed,
+ * even though it is not.
+ */
+ c->old_idx_sz = c->calc_idx_sz;
+ c->budg_uncommitted_idx = 0;
+ spin_unlock(&c->space_lock);
+ mutex_unlock(&c->tnc_mutex);
+
+ dbg_cmt("number of index LEBs %d", c->lst.idx_lebs);
+ dbg_cmt("size of index %llu", c->calc_idx_sz);
+ return err;
+
+out_free:
+ free_idx_lebs(c);
+out:
+ mutex_unlock(&c->tnc_mutex);
+ return err;
+}
+
+/**
+ * write_index - write index nodes.
+ * @c: UBIFS file-system description object
+ *
+ * This function writes the index nodes whose positions were laid out in the
+ * layout_in_empty_space function.
+ */
+static int write_index(struct ubifs_info *c)
+{
+ struct ubifs_idx_node *idx;
+ struct ubifs_znode *znode, *cnext;
+ int i, lnum, offs, len, next_len, buf_len, buf_offs, used;
+ int avail, wlen, err, lnum_pos = 0;
+
+ cnext = c->enext;
+ if (!cnext)
+ return 0;
+
+ /*
+ * Always write index nodes to the index head so that index nodes and
+ * other types of nodes are never mixed in the same erase block.
+ */
+ lnum = c->ihead_lnum;
+ buf_offs = c->ihead_offs;
+
+ /* Allocate commit buffer */
+ buf_len = ALIGN(c->max_idx_node_sz, c->min_io_size);
+ used = 0;
+ avail = buf_len;
+
+ /* Ensure there is enough room for first write */
+ next_len = ubifs_idx_node_sz(c, cnext->child_cnt);
+ if (buf_offs + next_len > c->leb_size) {
+ err = ubifs_update_one_lp(c, lnum, -1, -1, 0, LPROPS_TAKEN);
+ if (err)
+ return err;
+ lnum = -1;
+ }
+
+ while (1) {
+ cond_resched();
+
+ znode = cnext;
+ idx = c->cbuf + used;
+
+ /* Make index node */
+ idx->ch.node_type = UBIFS_IDX_NODE;
+ idx->child_cnt = cpu_to_le16(znode->child_cnt);
+ idx->level = cpu_to_le16(znode->level);
+ for (i = 0; i < znode->child_cnt; i++) {
+ struct ubifs_branch *br = ubifs_idx_branch(c, idx, i);
+ struct ubifs_zbranch *zbr = &znode->zbranch[i];
+
+ key_write_idx(c, &zbr->key, &br->key);
+ br->lnum = cpu_to_le32(zbr->lnum);
+ br->offs = cpu_to_le32(zbr->offs);
+ br->len = cpu_to_le32(zbr->len);
+ if (!zbr->lnum || !zbr->len) {
+ ubifs_err("bad ref in znode");
+ dbg_dump_znode(c, znode);
+ if (zbr->znode)
+ dbg_dump_znode(c, zbr->znode);
+ }
+ }
+ len = ubifs_idx_node_sz(c, znode->child_cnt);
+ ubifs_prepare_node(c, idx, len, 0);
+
+ /* Determine the index node position */
+ if (lnum == -1) {
+ lnum = c->ilebs[lnum_pos++];
+ buf_offs = 0;
+ used = 0;
+ avail = buf_len;
+ }
+ offs = buf_offs + used;
+
+#ifdef CONFIG_UBIFS_FS_DEBUG
+ if (lnum != znode->lnum || offs != znode->offs ||
+ len != znode->len) {
+ ubifs_err("inconsistent znode posn");
+ return -EINVAL;
+ }
+#endif
+
+ /* Grab some stuff from znode while we still can */
+ cnext = znode->cnext;
+
+ ubifs_assert(ubifs_zn_dirty(znode));
+ ubifs_assert(test_bit(COW_ZNODE, &znode->flags));
+
+ clear_bit(DIRTY_ZNODE, &znode->flags);
+ smp_mb__before_clear_bit();
+ clear_bit(COW_ZNODE, &znode->flags);
+ smp_mb__after_clear_bit();
+
+ /* Do not access znode from this point on */
+
+ /* Update buffer positions */
+ wlen = used + len;
+ used += ALIGN(len, 8);
+ avail -= ALIGN(len, 8);
+
+ /*
+ * Calculate the next index node length to see if there is
+ * enough room for it
+ */
+ if (cnext == c->cnext)
+ next_len = 0;
+ else
+ next_len = ubifs_idx_node_sz(c, cnext->child_cnt);
+
+ if (c->min_io_size == 1) {
+ /*
+ * Write the prepared index node immediately if there is
+ * no minimum IO size
+ */
+ err = ubifs_leb_write(c, lnum, c->cbuf, buf_offs,
+ wlen, UBI_SHORTTERM);
+ if (err)
+ return err;
+ buf_offs += ALIGN(wlen, 8);
+ if (next_len) {
+ used = 0;
+ avail = buf_len;
+ if (buf_offs + next_len > c->leb_size) {
+ err = ubifs_update_one_lp(c, lnum, -1,
+ -1, 0,
+ LPROPS_TAKEN);
+ if (err)
+ return err;
+ lnum = -1;
+ }
+ continue;
+ }
+ } else {
+ int blen, nxt_offs = buf_offs + used + next_len;
+
+ if (next_len && nxt_offs <= c->leb_size) {
+ if (avail > 0)
+ continue;
+ else
+ blen = buf_len;
+ } else {
+ wlen = ALIGN(wlen, 8);
+ blen = ALIGN(wlen, c->min_io_size);
+ ubifs_pad(c, c->cbuf + wlen, blen - wlen);
+ }
+ /*
+ * The buffer is full or there are no more znodes
+ * to do
+ */
+ err = ubifs_leb_write(c, lnum, c->cbuf, buf_offs,
+ blen, UBI_SHORTTERM);
+ if (err)
+ return err;
+ buf_offs += blen;
+ if (next_len) {
+ if (nxt_offs > c->leb_size) {
+ err = ubifs_update_one_lp(c, lnum, -1,
+ -1, 0,
+ LPROPS_TAKEN);
+ if (err)
+ return err;
+ lnum = -1;
+ }
+ used -= blen;
+ if (used < 0)
+ used = 0;
+ avail = buf_len - used;
+ memmove(c->cbuf, c->cbuf + blen, used);
+ continue;
+ }
+ }
+ break;
+ }
+
+#ifdef CONFIG_UBIFS_FS_DEBUG
+ if (lnum != c->new_ihead_lnum || buf_offs != c->new_ihead_offs) {
+ ubifs_err("inconsistent ihead");
+ return -EINVAL;
+ }
+#endif
+
+ c->ihead_lnum = lnum;
+ c->ihead_offs = buf_offs;
+
+ return 0;
+}
+
+/**
+ * free_obsolete_znodes - free obsolete znodes.
+ * @c: UBIFS file-system description object
+ *
+ * At the end of commit end, obsolete znodes are freed.
+ */
+static void free_obsolete_znodes(struct ubifs_info *c)
+{
+ struct ubifs_znode *znode, *cnext;
+
+ cnext = c->cnext;
+ do {
+ znode = cnext;
+ cnext = znode->cnext;
+ if (test_bit(OBSOLETE_ZNODE, &znode->flags))
+ kfree(znode);
+ else {
+ znode->cnext = NULL;
+ atomic_long_inc(&c->clean_zn_cnt);
+ atomic_long_inc(&ubifs_clean_zn_cnt);
+ }
+ } while (cnext != c->cnext);
+}
+
+/**
+ * return_gap_lebs - return LEBs used by the in-gap commit method.
+ * @c: UBIFS file-system description object
+ *
+ * This function clears the "taken" flag for the LEBs which were used by the
+ * "commit in-the-gaps" method.
+ */
+static int return_gap_lebs(struct ubifs_info *c)
+{
+ int *p, err;
+
+ if (!c->gap_lebs)
+ return 0;
+
+ dbg_cmt("");
+ for (p = c->gap_lebs; *p != -1; p++) {
+ err = ubifs_change_one_lp(c, *p, -1, -1, 0, LPROPS_TAKEN, 0);
+ if (err)
+ return err;
+ }
+
+ kfree(c->gap_lebs);
+ c->gap_lebs = NULL;
+ return 0;
+}
+
+/**
+ * ubifs_tnc_end_commit - update the TNC for commit end.
+ * @c: UBIFS file-system description object
+ *
+ * Write the dirty znodes.
+ */
+int ubifs_tnc_end_commit(struct ubifs_info *c)
+{
+ int err;
+
+ if (!c->cnext)
+ return 0;
+
+ err = return_gap_lebs(c);
+ if (err)
+ return err;
+
+ err = write_index(c);
+ if (err)
+ return err;
+
+ mutex_lock(&c->tnc_mutex);
+
+ dbg_cmt("TNC height is %d", c->zroot.znode->level + 1);
+
+ free_obsolete_znodes(c);
+
+ c->cnext = NULL;
+ kfree(c->ilebs);
+ c->ilebs = NULL;
+
+ mutex_unlock(&c->tnc_mutex);
+
+ return 0;
+}
--
1.5.4.1

2008-03-27 13:13:51

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 14/26] UBIFS: add TNC shrinker

The TNC cache grows with time, because UBIFS caches the indexing nodes
when the indexing B-tree is looked-up. But if the the file-system is
large enough, the TNC may consume a lot of memory, in which UBIFS prunes
it. Namely, it register memory shrinker for these purposes.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/shrinker.c | 410 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 410 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/shrinker.c b/fs/ubifs/shrinker.c
new file mode 100644
index 0000000..a0ea4b7
--- /dev/null
+++ b/fs/ubifs/shrinker.c
@@ -0,0 +1,410 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file implements UBIFS shrinker which evicts clean znodes from the TNC
+ * tree when Linux VM needs more RAM.
+ *
+ * We do not implement any LRU lists to find oldest znodes to free because it
+ * would add additional overhead to the file system fast paths. So the shrinker
+ * just walks the TNC tree when searching for znodes to free.
+ *
+ * If the root of a TNC sub-tree is clean and old enough, then the children are
+ * also clean and old enough. So the shrinker walks the TNC in level order and
+ * dumps entire sub-trees.
+ *
+ * The age of znodes is just the time-stamp when they were last looked at.
+ * The current shrinker first tries to evict old znodes, then young ones.
+ *
+ * Since the shrinker is global, it has to protect against races with FS
+ * un-mounts, which is done by the 'ubifs_infos_lock' and 'c->umount_mutex'.
+ */
+
+#include "ubifs.h"
+
+/* List of all UBIFS file-system instances */
+LIST_HEAD(ubifs_infos);
+
+/*
+ * We number each shrinker run and record the number on the ubifs_info structure
+ * so that we can easily work out which ubifs_info structures have already been
+ * done by the current run.
+ */
+static unsigned int shrinker_run_no;
+
+/* Protects 'ubifs_infos' list */
+DEFINE_SPINLOCK(ubifs_infos_lock);
+
+/* Global clean znode counter (for all mounted UBIFS instances) */
+atomic_long_t ubifs_clean_zn_cnt;
+
+/**
+ * tnc_levelorder_next - next TNC tree element in levelorder traversal.
+ * @zr: root of the subtree to traverse
+ * @znode: previous znode
+ *
+ * This function implements levelorder TNC traversal. The LNC is ignored.
+ * Returns the next element or %NULL if @znode is already the last one.
+ */
+static struct ubifs_znode *tnc_levelorder_next(struct ubifs_znode *zr,
+ struct ubifs_znode *znode)
+{
+ int level, iip, level_search = 0;
+ struct ubifs_znode *zn;
+
+ ubifs_assert(zr);
+
+ if (unlikely(!znode))
+ return zr;
+
+ if (unlikely(znode == zr)) {
+ if (znode->level == 0)
+ return NULL;
+ return ubifs_tnc_find_child(zr, 0);
+ }
+
+ level = znode->level;
+
+ iip = znode->iip;
+ while (1) {
+ ubifs_assert(znode->level <= zr->level);
+
+ /*
+ * First walk up until there is a znode with next branch to
+ * look at.
+ */
+ while (znode->parent != zr && iip >= znode->parent->child_cnt) {
+ znode = znode->parent;
+ iip = znode->iip;
+ }
+
+ if (unlikely(znode->parent == zr &&
+ iip >= znode->parent->child_cnt)) {
+ /* This level is done, switch to the lower one */
+ level -= 1;
+ if (level_search || level < 0)
+ /*
+ * We were already looking for znode at lower
+ * level ('level_search'). As we are here
+ * again, it just does not exist. Or all levels
+ * were finished ('level < 0').
+ */
+ return NULL;
+
+ level_search = 1;
+ iip = -1;
+ znode = ubifs_tnc_find_child(zr, 0);
+ ubifs_assert(znode);
+ }
+
+ /* Switch to the next index */
+ zn = ubifs_tnc_find_child(znode->parent, iip + 1);
+ if (!zn) {
+ /* No more children to look at, we have walk up */
+ iip = znode->parent->child_cnt;
+ continue;
+ }
+
+ /* Walk back down to the level we came from ('level') */
+ while (zn->level != level) {
+ znode = zn;
+ zn = ubifs_tnc_find_child(zn, 0);
+ if (!zn) {
+ /*
+ * This path is not too deep so it does not
+ * reach 'level'. Try next path.
+ */
+ iip = znode->iip;
+ break;
+ }
+ }
+
+ if (zn) {
+ ubifs_assert(zn->level >= 0);
+ return zn;
+ }
+ }
+}
+
+/**
+ * shrink_tnc - shrink TNC tree.
+ * @c: UBIFS file-system description object
+ * @nr: number of znodes to free
+ * @age: the age of znodes to free
+ * @contention: if any contention, this is set to %1
+ *
+ * This function traverses TNC tree and frees clean znodes. It does not free
+ * clean znodes which younger then @age. Returns number of freed znodes.
+ */
+static int shrink_tnc(struct ubifs_info *c, int nr, int age, int *contention)
+{
+ int total_freed = 0;
+ struct ubifs_znode *znode, *zprev;
+ int time = get_seconds();
+
+ ubifs_assert(mutex_is_locked(&c->umount_mutex));
+ ubifs_assert(mutex_is_locked(&c->tnc_mutex));
+
+ if (!c->zroot.znode || atomic_long_read(&c->clean_zn_cnt) == 0)
+ return 0;
+
+ /*
+ * Traverse the TNC tree in levelorder manner, so that it is possible
+ * to destroy large sub-trees. Indeed, if a znode is old, then all its
+ * children are older or of the same age.
+ *
+ * Note, we are holding 'c->tnc_mutex', so we do not have to lock the
+ * 'c->space_lock' when _reading_ 'c->clean_zn_cnt', because it is
+ * changed only when the 'c->tnc_mutex' is held.
+ */
+ zprev = NULL;
+ znode = tnc_levelorder_next(c->zroot.znode, NULL);
+ while (znode && total_freed < nr &&
+ atomic_long_read(&c->clean_zn_cnt) > 0) {
+ int freed;
+
+ /*
+ * If the znode is clean, but it is in the 'c->cnext' list, this
+ * means that this znode has just been written to flash as a
+ * part of commit and was marked clean. They will be removed
+ * from the list at end commit. We cannot change the list,
+ * because it is not protected by any mutex (design decision to
+ * make commit really independent and parallel to main I/O). So
+ * we just skip these znodes.
+ *
+ * Note, the 'clean_zn_cnt' counters are not updated until
+ * after the commit, so the UBIFS shrinker does not report
+ * the znodes which are in the 'c->cnext' list as freeable.
+ *
+ * Also note, if the root of a sub-tree is not in 'c->cnext',
+ * then the whole sub-tree is not in 'c->cnext' as well, so it
+ * is safe to dump whole sub-tree.
+ */
+
+ if (znode->cnext) {
+ /*
+ * Very soon these znodes will be removed from the list
+ * and become freeable.
+ */
+ *contention = 1;
+ } else if (!ubifs_zn_dirty(znode) &&
+ abs(time - znode->time) >= age) {
+ if (znode->parent)
+ znode->parent->zbranch[znode->iip].znode = NULL;
+ else
+ c->zroot.znode = NULL;
+
+ freed = ubifs_destroy_tnc_subtree(znode);
+ atomic_long_sub(freed, &ubifs_clean_zn_cnt);
+ atomic_long_sub(freed, &c->clean_zn_cnt);
+ ubifs_assert(atomic_long_read(&c->clean_zn_cnt) >= 0);
+ total_freed += freed;
+ znode = zprev;
+ }
+
+ if (unlikely(!c->zroot.znode))
+ break;
+
+ zprev = znode;
+ znode = tnc_levelorder_next(c->zroot.znode, znode);
+ cond_resched();
+ }
+
+ return total_freed;
+}
+
+/**
+ * shrink_tnc_trees - shrink UBIFS TNC trees.
+ * @nr: number of znodes to free
+ * @age: the age of znodes to free
+ * @contention: if any contention, this is set to %1
+ *
+ * This function walks the list of mounted UBIFS file-systems and frees clean
+ * znodes which are older then @age, until at least @nr znodes are freed.
+ * Returns the number of freed znodes.
+ */
+static int shrink_tnc_trees(int nr, int age, int *contention)
+{
+ struct ubifs_info *c;
+ struct list_head *p;
+ unsigned int run_no;
+ int freed = 0;
+
+ spin_lock(&ubifs_infos_lock);
+ do
+ run_no = ++shrinker_run_no;
+ while (run_no == 0);
+ /* Iterate over all mounted UBIFS file-systems and try to shrink them */
+ p = ubifs_infos.next;
+ while (p != &ubifs_infos) {
+ c = list_entry(p, struct ubifs_info, infos_list);
+ /*
+ * We move the ones we do to the end of the list, so we stop
+ * when we see one we have already done.
+ */
+ if (c->shrinker_run_no == run_no)
+ break;
+ if (!mutex_trylock(&c->umount_mutex)) {
+ /* Some un-mount is in progress, try next FS */
+ *contention = 1;
+ p = p->next;
+ continue;
+ }
+ /*
+ * We're holding 'c->umount_mutex', so the file-system won't go
+ * away.
+ */
+ if (!mutex_trylock(&c->tnc_mutex)) {
+ mutex_unlock(&c->umount_mutex);
+ *contention = 1;
+ p = p->next;
+ continue;
+ }
+ spin_unlock(&ubifs_infos_lock);
+ /*
+ * OK, now we have TNC locked, the file-system cannot go away -
+ * it is safe to reap the cache.
+ */
+ c->shrinker_run_no = run_no;
+ freed += shrink_tnc(c, nr, age, contention);
+ mutex_unlock(&c->tnc_mutex);
+ spin_lock(&ubifs_infos_lock);
+ /* Get the next list element before we move this one */
+ p = p->next;
+ /*
+ * Move this one to the end of the list to provide some
+ * fairness.
+ */
+ list_del(&c->infos_list);
+ list_add_tail(&c->infos_list, &ubifs_infos);
+ mutex_unlock(&c->umount_mutex);
+ if (freed >= nr)
+ break;
+ }
+ spin_unlock(&ubifs_infos_lock);
+ return freed;
+}
+
+/**
+ * kick_a_thread - kick a background thread to start commit.
+ *
+ * This function kicks a background thread to start background commit. Returns
+ * %-1 if a thread was kicked or there is another reason to assume the memory
+ * will soon be freed or become freeable. If there are no dirty znodes, returns
+ * %0.
+ */
+static int kick_a_thread(void)
+{
+ int i;
+ struct ubifs_info *c;
+
+ /*
+ * Iterate over all mounted UBIFS file-systems and find out if there is
+ * already an ongoing commit operation there. If no, then iterate for
+ * the second time and initiate background commit.
+ */
+ spin_lock(&ubifs_infos_lock);
+ for (i = 0; i < 2; i++) {
+ list_for_each_entry(c, &ubifs_infos, infos_list) {
+ long dirty_zn_cnt;
+
+ if (!mutex_trylock(&c->umount_mutex)) {
+ /*
+ * Some un-mount is in progress, it will
+ * certainly free memory, so just return.
+ */
+ spin_unlock(&ubifs_infos_lock);
+ return -1;
+ }
+
+ dirty_zn_cnt = atomic_long_read(&c->dirty_zn_cnt);
+
+ if (!dirty_zn_cnt || c->cmt_state == COMMIT_BROKEN ||
+ c->ro_media) {
+ mutex_unlock(&c->umount_mutex);
+ continue;
+ }
+
+ if (c->cmt_state != COMMIT_RESTING) {
+ spin_unlock(&ubifs_infos_lock);
+ mutex_unlock(&c->umount_mutex);
+ return -1;
+ }
+
+ if (i == 1) {
+ list_del(&c->infos_list);
+ list_add_tail(&c->infos_list, &ubifs_infos);
+ spin_unlock(&ubifs_infos_lock);
+
+ ubifs_request_bg_commit(c);
+ mutex_unlock(&c->umount_mutex);
+ return -1;
+ }
+ mutex_unlock(&c->umount_mutex);
+ }
+ }
+ spin_unlock(&ubifs_infos_lock);
+
+ return 0;
+}
+
+int ubifs_shrinker(int nr, gfp_t gfp_mask)
+{
+ int freed, contention = 0;
+ long clean_zn_cnt = atomic_long_read(&ubifs_clean_zn_cnt);
+
+ if (nr == 0)
+ return clean_zn_cnt;
+
+ if (!clean_zn_cnt) {
+ /*
+ * No clean znodes, nothing to reap. All we can do in this case
+ * is to kick background threads to start commit, which will
+ * probably make clean znodes which, in turn, will be freeable.
+ * And we return -1 which means will make VM call us again
+ * later.
+ */
+ dbg_tnc("no clean znodes, kick a thread");
+ return kick_a_thread();
+ }
+
+ freed = shrink_tnc_trees(nr, OLD_ZNODE_AGE, &contention);
+ if (freed >= nr)
+ goto out;
+
+ dbg_tnc("not enough old znodes, try to free young ones");
+ freed += shrink_tnc_trees(nr - freed, YOUNG_ZNODE_AGE, &contention);
+ if (freed >= nr)
+ goto out;
+
+ dbg_tnc("not enough young znodes, free all");
+ freed += shrink_tnc_trees(nr - freed, 0, &contention);
+
+ if (!freed && contention) {
+ dbg_tnc("freed nothing, but contention");
+ return -1;
+ }
+
+out:
+ dbg_tnc("%d znodes were freed, requested %d", freed, nr);
+ return freed;
+}
--
1.5.4.1

2008-03-27 13:14:57

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 16/26] UBIFS: add LEB properties tree

The LEB properties are stored and maintained on the flash media,
because otherwise UBIFS would need to scan whole media on each mount.
We store this per-LEB accounting information is the lprops tree (LPT)
which is an on-flash B-tree. The tree is updated out-of-place, as
everything in UBIFS. It has its own garbage-collector, and is kind
of small independent world whose task is to maintain the array of
per-eraseblock information.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/lpt.c | 2239 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 2239 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/lpt.c b/fs/ubifs/lpt.c
new file mode 100644
index 0000000..27288d7
--- /dev/null
+++ b/fs/ubifs/lpt.c
@@ -0,0 +1,2239 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements the LEB properties tree (LPT) area. The LPT area
+ * contains the LEB properties tree, a table of LPT area eraseblocks (ltab), and
+ * (for the "big" model) a table of saved LEB numbers (lsave). The LPT area sits
+ * between the log and the orphan area.
+ *
+ * The LPT area is like a miniature self-contained file system. It is required
+ * that it never runs out of space, is fast to access and update, and scales
+ * logarithmically. The LEB properties tree is implemented as a wandering tree
+ * much like the TNC, and the LPT area has its own garbage collection.
+ *
+ * The LPT has two slightly different forms called the "small model" and the
+ * "big model". The small model is used when the entire LEB properties table
+ * can be written into a single eraseblock. In that case, garbage collection
+ * consists of just writing the whole table, which therefore makes all other
+ * eraseblocks reusable. In the case of the big model, dirty eraseblocks are
+ * selected for garbage collection, which consists are marking the nodes in
+ * that LEB as dirty, and then only the dirty nodes are written out. Also, in
+ * the case of the big model, a table of LEB numbers is saved so that the entire
+ * LPT does not to be scanned looking for empty eraseblocks when UBIFS is first
+ * mounted.
+ */
+
+#include <linux/crc16.h>
+#include "ubifs.h"
+
+/**
+ * do_calc_lpt_geom - calculate sizes for the LPT area.
+ * @c: the UBIFS file-system description object
+ *
+ * Calculate the sizes of LPT bit fields, nodes, and tree, based on the
+ * properties of the flash and whether LPT is "big" (c->big_lpt).
+ */
+static void do_calc_lpt_geom(struct ubifs_info *c)
+{
+ int i, n, bits, per_leb_wastage, max_pnode_cnt;
+ long long sz, tot_wastage;
+
+ n = c->main_lebs + c->max_leb_cnt - c->leb_cnt;
+ max_pnode_cnt = DIV_ROUND_UP(n, UBIFS_LPT_FANOUT);
+
+ c->lpt_hght = 1;
+ n = UBIFS_LPT_FANOUT;
+ while (n < max_pnode_cnt) {
+ c->lpt_hght += 1;
+ n <<= UBIFS_LPT_FANOUT_SHIFT;
+ }
+
+ c->pnode_cnt = DIV_ROUND_UP(c->main_lebs, UBIFS_LPT_FANOUT);
+
+ n = DIV_ROUND_UP(c->pnode_cnt, UBIFS_LPT_FANOUT);
+ c->nnode_cnt = n;
+ for (i = 1; i < c->lpt_hght; i++) {
+ n = DIV_ROUND_UP(n, UBIFS_LPT_FANOUT);
+ c->nnode_cnt += n;
+ }
+
+ c->space_bits = fls(c->leb_size) - 3;
+ c->lpt_lnum_bits = fls(c->lpt_lebs);
+ c->lpt_offs_bits = fls(c->leb_size - 1);
+ c->lpt_spc_bits = fls(c->leb_size);
+
+ n = DIV_ROUND_UP(c->max_leb_cnt, UBIFS_LPT_FANOUT);
+ c->pcnt_bits = fls(n - 1);
+
+ c->lnum_bits = fls(c->max_leb_cnt - 1);
+
+ bits = UBIFS_LPT_CRC_BITS + UBIFS_LPT_TYPE_BITS +
+ (c->big_lpt ? c->pcnt_bits : 0) +
+ (c->space_bits * 2 + 1) * UBIFS_LPT_FANOUT;
+ c->pnode_sz = (bits + 7) / 8;
+
+ bits = UBIFS_LPT_CRC_BITS + UBIFS_LPT_TYPE_BITS +
+ (c->big_lpt ? c->pcnt_bits : 0) +
+ (c->lpt_lnum_bits + c->lpt_offs_bits) * UBIFS_LPT_FANOUT;
+ c->nnode_sz = (bits + 7) / 8;
+
+ bits = UBIFS_LPT_CRC_BITS + UBIFS_LPT_TYPE_BITS +
+ c->lpt_lebs * c->lpt_spc_bits * 2;
+ c->ltab_sz = (bits + 7) / 8;
+
+ bits = UBIFS_LPT_CRC_BITS + UBIFS_LPT_TYPE_BITS +
+ c->lnum_bits * c->lsave_cnt;
+ c->lsave_sz = (bits + 7) / 8;
+
+ /* Calculate the minimum LPT size */
+ c->lpt_sz = (long long)c->pnode_cnt * c->pnode_sz;
+ c->lpt_sz += (long long)c->nnode_cnt * c->nnode_sz;
+ c->lpt_sz += c->ltab_sz;
+ c->lpt_sz += c->lsave_sz;
+
+ /* Add wastage */
+ sz = c->lpt_sz;
+ per_leb_wastage = max_t(int, c->pnode_sz, c->nnode_sz);
+ sz += per_leb_wastage;
+ tot_wastage = per_leb_wastage;
+ while (sz > c->leb_size) {
+ sz += per_leb_wastage;
+ sz -= c->leb_size;
+ tot_wastage += per_leb_wastage;
+ }
+ tot_wastage += ALIGN(sz, c->min_io_size) - sz;
+ c->lpt_sz += tot_wastage;
+}
+
+/**
+ * ubifs_calc_lpt_geom - calculate and check sizes for the LPT area.
+ * @c: the UBIFS file-system description object
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_calc_lpt_geom(struct ubifs_info *c)
+{
+ int lebs_needed;
+ long long sz;
+
+ do_calc_lpt_geom(c);
+
+ /* Verify that lpt_lebs is big enough */
+ sz = c->lpt_sz * 2; /* Must have at least 2 times the size */
+ sz += c->leb_size - 1;
+ do_div(sz, c->leb_size);
+ lebs_needed = sz;
+ if (lebs_needed > c->lpt_lebs) {
+ ubifs_err("too few LPT LEBs");
+ return -EINVAL;
+ }
+
+ /* Verify that ltab fits in a single LEB (since ltab is a single node */
+ if (c->ltab_sz > c->leb_size) {
+ ubifs_err("LPT ltab too big");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/**
+ * calc_dflt_lpt_geom - calculate default LPT geometry.
+ * @c: the UBIFS file-system description object
+ * @main_lebs: number of main area LEBs is passed and returned here
+ * @big_lpt: whether the LPT area is "big" is returned here
+ *
+ * The size of the LPT area depends on parameters that themselves are dependent
+ * on the size of the LPT area. This function, successively recalculates the LPT
+ * area geometry until the parameters and resultant geometry are consistent.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int calc_dflt_lpt_geom(struct ubifs_info *c, int *main_lebs,
+ int *big_lpt)
+{
+ int i, lebs_needed;
+ long long sz;
+
+ /* Start by assuming the minimum number of LPT LEBs */
+ c->lpt_lebs = UBIFS_MIN_LPT_LEBS;
+ c->main_lebs = *main_lebs - c->lpt_lebs;
+ if (c->main_lebs <= 0)
+ return -EINVAL;
+
+ /* And assume we will use the small LPT model */
+ c->big_lpt = 0;
+
+ /*
+ * Calculate the geometry based on assumptions above and then see if it
+ * makes sense
+ */
+ do_calc_lpt_geom(c);
+
+ /* Small LPT model must have lpt_sz < leb_size */
+ if (c->lpt_sz > c->leb_size) {
+ /* Nope, so try again using big LPT model */
+ c->big_lpt = 1;
+ do_calc_lpt_geom(c);
+ }
+
+ /* Now check there are enough LPT LEBs */
+ for (i = 0; i < 64 ; i++) {
+ sz = c->lpt_sz * 4; /* Allow 4 times the size */
+ sz += c->leb_size - 1;
+ do_div(sz, c->leb_size);
+ lebs_needed = sz;
+ if (lebs_needed > c->lpt_lebs) {
+ /* Not enough LPT LEBs so try again with more */
+ c->lpt_lebs = lebs_needed;
+ c->main_lebs = *main_lebs - c->lpt_lebs;
+ if (c->main_lebs <= 0)
+ return -EINVAL;
+ do_calc_lpt_geom(c);
+ continue;
+ }
+ if (c->ltab_sz > c->leb_size) {
+ ubifs_err("LPT ltab too big");
+ return -EINVAL;
+ }
+ *main_lebs = c->main_lebs;
+ *big_lpt = c->big_lpt;
+ return 0;
+ }
+ return -EINVAL;
+}
+
+/**
+ * pack_bits - pack bit fields end-to-end.
+ * @addr: address at which to pack (passed and next address returned)
+ * @pos: bit position at which to pack (passed and next position returned)
+ * @val: value to pack
+ * @nrbits: number of bits of value to pack (1-32)
+ */
+static void pack_bits(uint8_t **addr, int *pos, uint32_t val, int nrbits)
+{
+ uint8_t *p = *addr;
+ int b = *pos;
+
+ ubifs_assert(nrbits > 0);
+ ubifs_assert(nrbits <= 32);
+ ubifs_assert(*pos >= 0);
+ ubifs_assert(*pos < 8);
+ ubifs_assert((val >> nrbits) == 0 || nrbits == 32);
+ if (b) {
+ *p |= ((uint8_t)val) << b;
+ nrbits += b;
+ if (nrbits > 8) {
+ *++p = (uint8_t)(val >>= (8 - b));
+ if (nrbits > 16) {
+ *++p = (uint8_t)(val >>= 8);
+ if (nrbits > 24) {
+ *++p = (uint8_t)(val >>= 8);
+ if (nrbits > 32)
+ *++p = (uint8_t)(val >>= 8);
+ }
+ }
+ }
+ } else {
+ *p = (uint8_t)val;
+ if (nrbits > 8) {
+ *++p = (uint8_t)(val >>= 8);
+ if (nrbits > 16) {
+ *++p = (uint8_t)(val >>= 8);
+ if (nrbits > 24)
+ *++p = (uint8_t)(val >>= 8);
+ }
+ }
+ }
+ b = nrbits & 7;
+ if (b == 0)
+ p++;
+ *addr = p;
+ *pos = b;
+}
+
+/**
+ * ubifs_unpack_bits - unpack bit fields.
+ * @addr: address at which to unpack (passed and next address returned)
+ * @pos: bit position at which to unpack (passed and next position returned)
+ * @nrbits: number of bits of value to unpack (1-32)
+ *
+ * This functions returns the value unpacked.
+ */
+uint32_t ubifs_unpack_bits(uint8_t **addr, int *pos, int nrbits)
+{
+ const int k = 32 - nrbits;
+ uint8_t *p = *addr;
+ int b = *pos;
+ uint32_t val;
+
+ ubifs_assert(nrbits > 0);
+ ubifs_assert(nrbits <= 32);
+ ubifs_assert(*pos >= 0);
+ ubifs_assert(*pos < 8);
+ if (b) {
+ val = p[1] | ((uint32_t)p[2] << 8) | ((uint32_t)p[3] << 16) |
+ ((uint32_t)p[4] << 24);
+ val <<= (8 - b);
+ val |= *p >> b;
+ nrbits += b;
+ } else
+ val = p[0] | ((uint32_t)p[1] << 8) | ((uint32_t)p[2] << 16) |
+ ((uint32_t)p[3] << 24);
+ val <<= k;
+ val >>= k;
+ b = nrbits & 7;
+ p += nrbits / 8;
+ *addr = p;
+ *pos = b;
+ ubifs_assert((val >> nrbits) == 0 || nrbits - b == 32);
+ return val;
+}
+
+/**
+ * ubifs_pack_pnode - pack all the bit fields of a pnode.
+ * @c: UBIFS file-system description object
+ * @buf: buffer into which to pack
+ * @pnode: pnode to pack
+ */
+void ubifs_pack_pnode(struct ubifs_info *c, void *buf,
+ struct ubifs_pnode *pnode)
+{
+ uint8_t *addr = buf + UBIFS_LPT_CRC_BYTES;
+ int i, pos = 0;
+ uint16_t crc;
+
+ pack_bits(&addr, &pos, UBIFS_LPT_PNODE, UBIFS_LPT_TYPE_BITS);
+ if (c->big_lpt)
+ pack_bits(&addr, &pos, pnode->num, c->pcnt_bits);
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ pack_bits(&addr, &pos, pnode->lprops[i].free >> 3,
+ c->space_bits);
+ pack_bits(&addr, &pos, pnode->lprops[i].dirty >> 3,
+ c->space_bits);
+ if (pnode->lprops[i].flags & LPROPS_INDEX)
+ pack_bits(&addr, &pos, 1, 1);
+ else
+ pack_bits(&addr, &pos, 0, 1);
+ }
+ crc = crc16(-1, buf + UBIFS_LPT_CRC_BYTES,
+ c->pnode_sz - UBIFS_LPT_CRC_BYTES);
+ addr = buf;
+ pos = 0;
+ pack_bits(&addr, &pos, crc, UBIFS_LPT_CRC_BITS);
+}
+
+/**
+ * ubifs_pack_nnode - pack all the bit fields of a nnode.
+ * @c: UBIFS file-system description object
+ * @buf: buffer into which to pack
+ * @nnode: nnode to pack
+ */
+void ubifs_pack_nnode(struct ubifs_info *c, void *buf,
+ struct ubifs_nnode *nnode)
+{
+ uint8_t *addr = buf + UBIFS_LPT_CRC_BYTES;
+ int i, pos = 0;
+ uint16_t crc;
+
+ pack_bits(&addr, &pos, UBIFS_LPT_NNODE, UBIFS_LPT_TYPE_BITS);
+ if (c->big_lpt)
+ pack_bits(&addr, &pos, nnode->num, c->pcnt_bits);
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ int lnum = nnode->nbranch[i].lnum;
+
+ if (lnum == 0)
+ lnum = c->lpt_last + 1;
+ pack_bits(&addr, &pos, lnum - c->lpt_first, c->lpt_lnum_bits);
+ pack_bits(&addr, &pos, nnode->nbranch[i].offs,
+ c->lpt_offs_bits);
+ }
+ crc = crc16(-1, buf + UBIFS_LPT_CRC_BYTES,
+ c->nnode_sz - UBIFS_LPT_CRC_BYTES);
+ addr = buf;
+ pos = 0;
+ pack_bits(&addr, &pos, crc, UBIFS_LPT_CRC_BITS);
+}
+
+/**
+ * ubifs_pack_ltab - pack the LPT's own lprops table.
+ * @c: UBIFS file-system description object
+ * @buf: buffer into which to pack
+ * @ltab: LPT's own lprops table to pack
+ */
+void ubifs_pack_ltab(struct ubifs_info *c, void *buf,
+ struct ubifs_lpt_lprops *ltab)
+{
+ uint8_t *addr = buf + UBIFS_LPT_CRC_BYTES;
+ int i, pos = 0;
+ uint16_t crc;
+
+ pack_bits(&addr, &pos, UBIFS_LPT_LTAB, UBIFS_LPT_TYPE_BITS);
+ for (i = 0; i < c->lpt_lebs; i++) {
+ pack_bits(&addr, &pos, ltab[i].free, c->lpt_spc_bits);
+ pack_bits(&addr, &pos, ltab[i].dirty, c->lpt_spc_bits);
+ }
+ crc = crc16(-1, buf + UBIFS_LPT_CRC_BYTES,
+ c->ltab_sz - UBIFS_LPT_CRC_BYTES);
+ addr = buf;
+ pos = 0;
+ pack_bits(&addr, &pos, crc, UBIFS_LPT_CRC_BITS);
+}
+
+/**
+ * ubifs_pack_lsave - pack the LPT's save table.
+ * @c: UBIFS file-system description object
+ * @buf: buffer into which to pack
+ * @lsave: LPT's save table to pack
+ */
+void ubifs_pack_lsave(struct ubifs_info *c, void *buf, int *lsave)
+{
+ uint8_t *addr = buf + UBIFS_LPT_CRC_BYTES;
+ int i, pos = 0;
+ uint16_t crc;
+
+ pack_bits(&addr, &pos, UBIFS_LPT_LSAVE, UBIFS_LPT_TYPE_BITS);
+ for (i = 0; i < c->lsave_cnt; i++)
+ pack_bits(&addr, &pos, lsave[i], c->lnum_bits);
+ crc = crc16(-1, buf + UBIFS_LPT_CRC_BYTES,
+ c->lsave_sz - UBIFS_LPT_CRC_BYTES);
+ addr = buf;
+ pos = 0;
+ pack_bits(&addr, &pos, crc, UBIFS_LPT_CRC_BITS);
+}
+
+/**
+ * ubifs_add_lpt_dirt - add dirty space to LPT LEB properties.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number to which to add dirty space
+ * @dirty: amount of dirty space to add
+ */
+void ubifs_add_lpt_dirt(struct ubifs_info *c, int lnum, int dirty)
+{
+ if (!dirty || !lnum)
+ return;
+ dbg_lp("LEB %d add %d to %d",
+ lnum, dirty, c->ltab[lnum - c->lpt_first].dirty);
+ ubifs_assert(lnum >= c->lpt_first && lnum <= c->lpt_last);
+ c->ltab[lnum - c->lpt_first].dirty += dirty;
+}
+
+/**
+ * set_ltab - set LPT LEB properties.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number
+ * @free: amount of free space
+ * @dirty: amount of dirty space
+ */
+static void set_ltab(struct ubifs_info *c, int lnum, int free, int dirty)
+{
+ dbg_lp("LEB %d free %d dirty %d to %d %d",
+ lnum, c->ltab[lnum - c->lpt_first].free,
+ c->ltab[lnum - c->lpt_first].dirty, free, dirty);
+ ubifs_assert(lnum >= c->lpt_first && lnum <= c->lpt_last);
+ c->ltab[lnum - c->lpt_first].free = free;
+ c->ltab[lnum - c->lpt_first].dirty = dirty;
+}
+
+/**
+ * ubifs_add_nnode_dirt - add dirty space to LPT LEB properties.
+ * @c: UBIFS file-system description object
+ * @nnode: nnode for which to add dirt
+ */
+void ubifs_add_nnode_dirt(struct ubifs_info *c, struct ubifs_nnode *nnode)
+{
+ struct ubifs_nnode *np = nnode->parent;
+
+ if (np)
+ ubifs_add_lpt_dirt(c, np->nbranch[nnode->iip].lnum,
+ c->nnode_sz);
+ else {
+ ubifs_add_lpt_dirt(c, c->lpt_lnum, c->nnode_sz);
+ if (!(c->lpt_drty_flgs & LTAB_DIRTY)) {
+ c->lpt_drty_flgs |= LTAB_DIRTY;
+ ubifs_add_lpt_dirt(c, c->ltab_lnum, c->ltab_sz);
+ }
+ }
+}
+
+/**
+ * add_pnode_dirt - add dirty space to LPT LEB properties.
+ * @c: UBIFS file-system description object
+ * @pnode: pnode for which to add dirt
+ */
+static void add_pnode_dirt(struct ubifs_info *c, struct ubifs_pnode *pnode)
+{
+ ubifs_add_lpt_dirt(c, pnode->parent->nbranch[pnode->iip].lnum,
+ c->pnode_sz);
+}
+
+/**
+ * calc_nnode_num - calculate nnode number.
+ * @row: the row in the tree (root is zero)
+ * @col: the column in the row (leftmost is zero)
+ *
+ * The nnode number is a number that uniquely identifies a nnode and can be used
+ * easily to traverse the tree from the root to that nnode.
+ *
+ * This function calculates and returns the nnode number for the nnode at @row
+ * and @col.
+ */
+static int calc_nnode_num(int row, int col)
+{
+ int num, bits;
+
+ num = 1;
+ while (row--) {
+ bits = (col & (UBIFS_LPT_FANOUT - 1));
+ col >>= UBIFS_LPT_FANOUT_SHIFT;
+ num <<= UBIFS_LPT_FANOUT_SHIFT;
+ num |= bits;
+ }
+ return num;
+}
+
+/**
+ * calc_nnode_num_from_parent - calculate nnode number.
+ * @c: UBIFS file-system description object
+ * @parent: parent nnode
+ * @iip: index in parent
+ *
+ * The nnode number is a number that uniquely identifies a nnode and can be used
+ * easily to traverse the tree from the root to that nnode.
+ *
+ * This function calculates and returns the nnode number based on the parent's
+ * nnode number and the index in parent.
+ */
+static int calc_nnode_num_from_parent(struct ubifs_info *c,
+ struct ubifs_nnode *parent, int iip)
+{
+ int num, shft;
+
+ if (!parent)
+ return 1;
+ shft = (c->lpt_hght - parent->level) * UBIFS_LPT_FANOUT_SHIFT;
+ num = parent->num ^ (1 << shft);
+ num |= (UBIFS_LPT_FANOUT + iip) << shft;
+ return num;
+}
+
+/**
+ * calc_pnode_num_from_parent - calculate pnode number.
+ * @c: UBIFS file-system description object
+ * @parent: parent nnode
+ * @iip: index in parent
+ *
+ * The pnode number is a number that uniquely identifies a pnode and can be used
+ * easily to traverse the tree from the root to that pnode.
+ *
+ * This function calculates and returns the pnode number based on the parent's
+ * nnode number and the index in parent.
+ */
+static int calc_pnode_num_from_parent(struct ubifs_info *c,
+ struct ubifs_nnode *parent, int iip)
+{
+ int i, n = c->lpt_hght - 1, pnum = parent->num, num = 0;
+
+ for (i = 0; i < n; i++) {
+ num <<= UBIFS_LPT_FANOUT_SHIFT;
+ num |= pnum & (UBIFS_LPT_FANOUT - 1);
+ pnum >>= UBIFS_LPT_FANOUT_SHIFT;
+ }
+ num <<= UBIFS_LPT_FANOUT_SHIFT;
+ num |= iip;
+ return num;
+}
+
+/**
+ * ubifs_create_dflt_lpt - create default LPT.
+ * @c: UBIFS file-system description object
+ * @main_lebs: number of main area LEBs is passed and returned here
+ * @lpt_first: LEB number of first LPT LEB
+ * @lpt_lebs: number of LEBs for LPT is passed and returned here
+ * @big_lpt: use big LPT model is passed and returned here
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_create_dflt_lpt(struct ubifs_info *c, int *main_lebs, int lpt_first,
+ int *lpt_lebs, int *big_lpt)
+{
+ int lnum, err = 0, node_sz, iopos, i, j, cnt, len, alen, row;
+ int blnum, boffs, bsz, bcnt;
+ struct ubifs_pnode *pnode = NULL;
+ struct ubifs_nnode *nnode = NULL;
+ void *buf = NULL, *p;
+ struct ubifs_lpt_lprops *ltab = NULL;
+ int *lsave = NULL;
+
+ err = calc_dflt_lpt_geom(c, main_lebs, big_lpt);
+ if (err)
+ return err;
+ *lpt_lebs = c->lpt_lebs;
+
+ /* Needed by 'ubifs_pack_nnode()' and 'set_ltab()' */
+ c->lpt_first = lpt_first;
+ /* Needed by 'set_ltab()' */
+ c->lpt_last = lpt_first + c->lpt_lebs - 1;
+ /* Needed by 'ubifs_pack_lsave()' */
+ c->main_first = c->leb_cnt - *main_lebs;
+
+ pnode = kzalloc(sizeof(struct ubifs_pnode), GFP_KERNEL);
+ nnode = kzalloc(sizeof(struct ubifs_nnode), GFP_KERNEL);
+ buf = vmalloc(c->leb_size);
+ ltab = vmalloc(sizeof(struct ubifs_lpt_lprops) * c->lpt_lebs);
+ lsave = kmalloc(sizeof(int) * c->lsave_cnt, GFP_KERNEL);
+ if (!pnode || !nnode || !buf || !ltab || !lsave) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ ubifs_assert(c->ltab == NULL);
+ c->ltab = ltab; /* Needed by set_ltab */
+
+ /* Initialize LPT's own lprops */
+ for (i = 0; i < c->lpt_lebs; i++) {
+ ltab[i].free = c->leb_size;
+ ltab[i].dirty = 0;
+ ltab[i].tgc = 0;
+ ltab[i].cmt = 0;
+ }
+
+ lnum = lpt_first;
+ p = buf;
+ /* Number of leaf nodes (pnodes) */
+ cnt = c->pnode_cnt;
+
+ /*
+ * The first pnode contains the LEB properties for the LEBs that contain
+ * the root inode node and the root index node of the index tree.
+ */
+ node_sz = ALIGN(ubifs_idx_node_sz(c, 1), 8);
+ iopos = ALIGN(node_sz, c->min_io_size);
+ pnode->lprops[0].free = c->leb_size - iopos;
+ pnode->lprops[0].dirty = iopos - node_sz;
+ pnode->lprops[0].flags = LPROPS_INDEX;
+
+ node_sz = UBIFS_INO_NODE_SZ;
+ iopos = ALIGN(node_sz, c->min_io_size);
+ pnode->lprops[1].free = c->leb_size - iopos;
+ pnode->lprops[1].dirty = iopos - node_sz;
+
+ for (i = 2; i < UBIFS_LPT_FANOUT; i++)
+ pnode->lprops[i].free = c->leb_size;
+
+ /* Add first pnode */
+ ubifs_pack_pnode(c, p, pnode);
+ p += c->pnode_sz;
+ len = c->pnode_sz;
+ pnode->num += 1;
+
+ /* Reset pnode values for remaining pnodes */
+ pnode->lprops[0].free = c->leb_size;
+ pnode->lprops[0].dirty = 0;
+ pnode->lprops[0].flags = 0;
+
+ pnode->lprops[1].free = c->leb_size;
+ pnode->lprops[1].dirty = 0;
+
+ /*
+ * To calculate the internal node branches, we keep information about
+ * the level below.
+ */
+ blnum = lnum; /* LEB number of level below */
+ boffs = 0; /* Offset of level below */
+ bcnt = cnt; /* Number of nodes in level below */
+ bsz = c->pnode_sz; /* Size of nodes in level below */
+
+ /* Add all remaining pnodes */
+ for (i = 1; i < cnt; i++) {
+ if (len + c->pnode_sz > c->leb_size) {
+ alen = ALIGN(len, c->min_io_size);
+ set_ltab(c, lnum, c->leb_size - alen, alen - len);
+ memset(p, 0xff, alen - len);
+ err = ubi_leb_change(c->ubi, lnum++, buf, alen,
+ UBI_SHORTTERM);
+ if (err)
+ goto out;
+ p = buf;
+ len = 0;
+ }
+ ubifs_pack_pnode(c, p, pnode);
+ p += c->pnode_sz;
+ len += c->pnode_sz;
+ /*
+ * pnodes are simply numbered left to right starting at zero,
+ * which means the pnode number can be used easily to traverse
+ * down the tree to the corresponding pnode.
+ */
+ pnode->num += 1;
+ }
+
+ row = 0;
+ for (i = UBIFS_LPT_FANOUT; cnt > i; i <<= UBIFS_LPT_FANOUT_SHIFT)
+ row += 1;
+ /* Add all nnodes, one level at a time */
+ while (1) {
+ /* Number of internal nodes (nnodes) at next level */
+ cnt = DIV_ROUND_UP(cnt, UBIFS_LPT_FANOUT);
+ for (i = 0; i < cnt; i++) {
+ if (len + c->nnode_sz > c->leb_size) {
+ alen = ALIGN(len, c->min_io_size);
+ set_ltab(c, lnum, c->leb_size - alen,
+ alen - len);
+ memset(p, 0xff, alen - len);
+ err = ubi_leb_change(c->ubi, lnum++, buf, alen,
+ UBI_SHORTTERM);
+ if (err)
+ goto out;
+ p = buf;
+ len = 0;
+ }
+ /* Only 1 nnode at this level, so it is the root */
+ if (cnt == 1) {
+ c->lpt_lnum = lnum;
+ c->lpt_offs = len;
+ }
+ /* Set branches to the level below */
+ for (j = 0; j < UBIFS_LPT_FANOUT; j++) {
+ if (bcnt) {
+ if (boffs + bsz > c->leb_size) {
+ blnum += 1;
+ boffs = 0;
+ }
+ nnode->nbranch[j].lnum = blnum;
+ nnode->nbranch[j].offs = boffs;
+ boffs += bsz;
+ bcnt--;
+ } else {
+ nnode->nbranch[j].lnum = 0;
+ nnode->nbranch[j].offs = 0;
+ }
+ }
+ nnode->num = calc_nnode_num(row, i);
+ ubifs_pack_nnode(c, p, nnode);
+ p += c->nnode_sz;
+ len += c->nnode_sz;
+ }
+ /* Only 1 nnode at this level, so it is the root */
+ if (cnt == 1)
+ break;
+ /* Update the information about the level below */
+ bcnt = cnt;
+ bsz = c->nnode_sz;
+ row -= 1;
+ }
+
+ if (*big_lpt) {
+ /* Need to add LPT's save table */
+ if (len + c->lsave_sz > c->leb_size) {
+ alen = ALIGN(len, c->min_io_size);
+ set_ltab(c, lnum, c->leb_size - alen, alen - len);
+ memset(p, 0xff, alen - len);
+ err = ubi_leb_change(c->ubi, lnum++, buf, alen,
+ UBI_SHORTTERM);
+ if (err)
+ goto out;
+ p = buf;
+ len = 0;
+ }
+
+ c->lsave_lnum = lnum;
+ c->lsave_offs = len;
+
+ for (i = 0; i < c->lsave_cnt && i < *main_lebs; i++)
+ lsave[i] = c->main_first + i;
+ for (; i < c->lsave_cnt; i++)
+ lsave[i] = c->main_first;
+
+ ubifs_pack_lsave(c, p, lsave);
+ p += c->lsave_sz;
+ len += c->lsave_sz;
+ }
+
+ /* Need to add LPT's own LEB properties table */
+ if (len + c->ltab_sz > c->leb_size) {
+ alen = ALIGN(len, c->min_io_size);
+ set_ltab(c, lnum, c->leb_size - alen, alen - len);
+ memset(p, 0xff, alen - len);
+ err = ubi_leb_change(c->ubi, lnum++, buf, alen, UBI_SHORTTERM);
+ if (err)
+ goto out;
+ p = buf;
+ len = 0;
+ }
+
+ c->ltab_lnum = lnum;
+ c->ltab_offs = len;
+
+ /* Update ltab before packing it */
+ len += c->ltab_sz;
+ alen = ALIGN(len, c->min_io_size);
+ set_ltab(c, lnum, c->leb_size - alen, alen - len);
+
+ ubifs_pack_ltab(c, p, ltab);
+ p += c->ltab_sz;
+
+ /* Write remaining buffer */
+ memset(p, 0xff, alen - len);
+ err = ubi_leb_change(c->ubi, lnum, buf, alen, UBI_SHORTTERM);
+ if (err)
+ goto out;
+
+ c->nhead_lnum = lnum;
+ c->nhead_offs = ALIGN(len, c->min_io_size);
+
+ dbg_lp("space_bits %d", c->space_bits);
+ dbg_lp("lpt_lnum_bits %d", c->lpt_lnum_bits);
+ dbg_lp("lpt_offs_bits %d", c->lpt_offs_bits);
+ dbg_lp("lpt_spc_bits %d", c->lpt_spc_bits);
+ dbg_lp("pcnt_bits %d", c->pcnt_bits);
+ dbg_lp("lnum_bits %d", c->lnum_bits);
+ dbg_lp("pnode_sz %d", c->pnode_sz);
+ dbg_lp("nnode_sz %d", c->nnode_sz);
+ dbg_lp("ltab_sz %d", c->ltab_sz);
+ dbg_lp("lsave_sz %d", c->lsave_sz);
+ dbg_lp("lpt_hght %d", c->lpt_hght);
+ dbg_lp("big_lpt %d", c->big_lpt);
+ dbg_lp("LPT root is at %d:%d", c->lpt_lnum, c->lpt_offs);
+ dbg_lp("LPT head is at %d:%d", c->nhead_lnum, c->nhead_offs);
+ dbg_lp("LPT ltab is at %d:%d", c->ltab_lnum, c->ltab_offs);
+ if (c->big_lpt)
+ dbg_lp("LPT lsave is at %d:%d", c->lsave_lnum, c->lsave_offs);
+out:
+ c->ltab = NULL;
+ kfree(lsave);
+ vfree(ltab);
+ vfree(buf);
+ kfree(nnode);
+ kfree(pnode);
+ return err;
+}
+
+/**
+ * update_cats - add LEB properties of a pnode to LEB category lists and heaps.
+ * @c: UBIFS file-system description object
+ * @pnode: pnode
+ *
+ * When a pnode is loaded into memory, the LEB properties it contains are added,
+ * by this function, to the LEB category lists and heaps.
+ */
+static void update_cats(struct ubifs_info *c, struct ubifs_pnode *pnode)
+{
+ int i;
+
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ int cat = pnode->lprops[i].flags & LPROPS_CAT_MASK;
+ int lnum = pnode->lprops[i].lnum;
+
+ if (!lnum)
+ return;
+ ubifs_add_to_cat(c, &pnode->lprops[i], cat);
+ }
+}
+
+/**
+ * replace_cats - add LEB properties of a pnode to LEB category lists and heaps.
+ * @c: UBIFS file-system description object
+ * @old_pnode: pnode copied
+ * @new_pnode: pnode copy
+ *
+ * During commit it is sometimes necessary to copy a pnode
+ * (see dirty_cow_pnode). When that happens, references in
+ * category lists and heaps must be replaced. This function does that.
+ */
+static void replace_cats(struct ubifs_info *c, struct ubifs_pnode *old_pnode,
+ struct ubifs_pnode *new_pnode)
+{
+ int i;
+
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ if (!new_pnode->lprops[i].lnum)
+ return;
+ ubifs_replace_cat(c, &old_pnode->lprops[i],
+ &new_pnode->lprops[i]);
+ }
+}
+
+/**
+ * check_lpt_crc - check LPT node crc is correct.
+ * @c: UBIFS file-system description object
+ * @buf: buffer containing node
+ * @len: length of node
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int check_lpt_crc(void *buf, int len)
+{
+ int pos = 0;
+ uint8_t *addr = buf;
+ uint16_t crc, calc_crc;
+
+ crc = ubifs_unpack_bits(&addr, &pos, UBIFS_LPT_CRC_BITS);
+ calc_crc = crc16(-1, buf + UBIFS_LPT_CRC_BYTES,
+ len - UBIFS_LPT_CRC_BYTES);
+ if (crc != calc_crc) {
+ ubifs_err("invalid crc in LPT node: crc %hx calc %hx", crc,
+ calc_crc);
+ dbg_dump_stack();
+ return -EINVAL;
+ }
+ return 0;
+}
+
+/**
+ * check_lpt_type - check LPT node type is correct.
+ * @c: UBIFS file-system description object
+ * @addr: address of type bit field is passed and returned updated here
+ * @pos: position of type bit field is passed and returned updated here
+ * @type: expected type
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int check_lpt_type(uint8_t **addr, int *pos, int type)
+{
+ int node_type;
+
+ node_type = ubifs_unpack_bits(addr, pos, UBIFS_LPT_TYPE_BITS);
+ if (node_type != type) {
+ ubifs_err("invalid type (%d) in LPT node type %d", node_type,
+ type);
+ dbg_dump_stack();
+ return -EINVAL;
+ }
+ return 0;
+}
+
+/**
+ * unpack_pnode - unpack a pnode.
+ * @c: UBIFS file-system description object
+ * @buf: buffer containing packed pnode to unpack
+ * @pnode: pnode structure to fill
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int unpack_pnode(struct ubifs_info *c, void *buf,
+ struct ubifs_pnode *pnode)
+{
+ uint8_t *addr = buf + UBIFS_LPT_CRC_BYTES;
+ int i, pos = 0, err;
+
+ err = check_lpt_type(&addr, &pos, UBIFS_LPT_PNODE);
+ if (err)
+ return err;
+ if (c->big_lpt)
+ pnode->num = ubifs_unpack_bits(&addr, &pos, c->pcnt_bits);
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ struct ubifs_lprops * const lprops = &pnode->lprops[i];
+
+ lprops->free = ubifs_unpack_bits(&addr, &pos, c->space_bits);
+ lprops->free <<= 3;
+ lprops->dirty = ubifs_unpack_bits(&addr, &pos, c->space_bits);
+ lprops->dirty <<= 3;
+
+ if (ubifs_unpack_bits(&addr, &pos, 1))
+ lprops->flags = LPROPS_INDEX;
+ else
+ lprops->flags = 0;
+ lprops->flags |= ubifs_categorize_lprops(c, lprops);
+ }
+ err = check_lpt_crc(buf, c->pnode_sz);
+ return err;
+}
+
+/**
+ * unpack_nnode - unpack a nnode.
+ * @c: UBIFS file-system description object
+ * @buf: buffer containing packed nnode to unpack
+ * @nnode: nnode structure to fill
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int unpack_nnode(struct ubifs_info *c, void *buf,
+ struct ubifs_nnode *nnode)
+{
+ uint8_t *addr = buf + UBIFS_LPT_CRC_BYTES;
+ int i, pos = 0, err;
+
+ err = check_lpt_type(&addr, &pos, UBIFS_LPT_NNODE);
+ if (err)
+ return err;
+ if (c->big_lpt)
+ nnode->num = ubifs_unpack_bits(&addr, &pos, c->pcnt_bits);
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ int lnum;
+
+ lnum = ubifs_unpack_bits(&addr, &pos, c->lpt_lnum_bits) +
+ c->lpt_first;
+ if (lnum == c->lpt_last + 1)
+ lnum = 0;
+ nnode->nbranch[i].lnum = lnum;
+ nnode->nbranch[i].offs = ubifs_unpack_bits(&addr, &pos,
+ c->lpt_offs_bits);
+ }
+ err = check_lpt_crc(buf, c->nnode_sz);
+ return err;
+}
+
+/**
+ * unpack_ltab - unpack the LPT's own lprops table.
+ * @c: UBIFS file-system description object
+ * @buf: buffer from which to unpack
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int unpack_ltab(struct ubifs_info *c, void *buf)
+{
+ uint8_t *addr = buf + UBIFS_LPT_CRC_BYTES;
+ int i, pos = 0, err;
+
+ err = check_lpt_type(&addr, &pos, UBIFS_LPT_LTAB);
+ if (err)
+ return err;
+ for (i = 0; i < c->lpt_lebs; i++) {
+ int free = ubifs_unpack_bits(&addr, &pos, c->lpt_spc_bits);
+ int dirty = ubifs_unpack_bits(&addr, &pos, c->lpt_spc_bits);
+
+ if (free < 0 || free > c->leb_size || dirty < 0 ||
+ dirty > c->leb_size || free + dirty > c->leb_size)
+ return -EINVAL;
+
+ c->ltab[i].free = free;
+ c->ltab[i].dirty = dirty;
+ c->ltab[i].tgc = 0;
+ c->ltab[i].cmt = 0;
+ }
+ err = check_lpt_crc(buf, c->ltab_sz);
+ return err;
+}
+
+/**
+ * unpack_lsave - unpack the LPT's save table.
+ * @c: UBIFS file-system description object
+ * @buf: buffer from which to unpack
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int unpack_lsave(struct ubifs_info *c, void *buf)
+{
+ uint8_t *addr = buf + UBIFS_LPT_CRC_BYTES;
+ int i, pos = 0, err;
+
+ err = check_lpt_type(&addr, &pos, UBIFS_LPT_LSAVE);
+ if (err)
+ return err;
+ for (i = 0; i < c->lsave_cnt; i++) {
+ int lnum = ubifs_unpack_bits(&addr, &pos, c->lnum_bits);
+
+ if (lnum < c->main_first || lnum >= c->leb_cnt)
+ return -EINVAL;
+ c->lsave[i] = lnum;
+ }
+ err = check_lpt_crc(buf, c->lsave_sz);
+ return err;
+}
+
+/**
+ * validate_nnode - validate a nnode.
+ * @c: UBIFS file-system description object
+ * @nnode: nnode to validate
+ * @parent: parent nnode (or NULL for the root nnode)
+ * @iip: index in parent
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int validate_nnode(struct ubifs_info *c, struct ubifs_nnode *nnode,
+ struct ubifs_nnode *parent, int iip)
+{
+ int i, lvl, max_offs;
+
+ if (c->big_lpt) {
+ int num = calc_nnode_num_from_parent(c, parent, iip);
+
+ if (nnode->num != num)
+ return -EINVAL;
+ }
+ lvl = parent ? parent->level - 1 : c->lpt_hght;
+ if (lvl < 1)
+ return -EINVAL;
+ if (lvl == 1)
+ max_offs = c->leb_size - c->pnode_sz;
+ else
+ max_offs = c->leb_size - c->nnode_sz;
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ int lnum = nnode->nbranch[i].lnum;
+ int offs = nnode->nbranch[i].offs;
+
+ if (lnum == 0) {
+ if (offs != 0)
+ return -EINVAL;
+ continue;
+ }
+ if (lnum < c->lpt_first || lnum > c->lpt_last)
+ return -EINVAL;
+ if (offs < 0 || offs > max_offs)
+ return -EINVAL;
+ }
+ return 0;
+}
+
+/**
+ * validate_pnode - validate a pnode.
+ * @c: UBIFS file-system description object
+ * @pnode: pnode to validate
+ * @parent: parent nnode
+ * @iip: index in parent
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int validate_pnode(struct ubifs_info *c, struct ubifs_pnode *pnode,
+ struct ubifs_nnode *parent, int iip)
+{
+ int i;
+
+ if (c->big_lpt) {
+ int num = calc_pnode_num_from_parent(c, parent, iip);
+
+ if (pnode->num != num)
+ return -EINVAL;
+ }
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ int free = pnode->lprops[i].free;
+ int dirty = pnode->lprops[i].dirty;
+
+ if (free < 0 || free > c->leb_size || free % c->min_io_size ||
+ (free & 7))
+ return -EINVAL;
+ if (dirty < 0 || dirty > c->leb_size || (dirty & 7))
+ return -EINVAL;
+ if (dirty + free > c->leb_size)
+ return -EINVAL;
+ }
+ return 0;
+}
+
+/**
+ * set_pnode_lnum - set LEB numbers on a pnode.
+ * @c: UBIFS file-system description object
+ * @pnode: pnode to update
+ *
+ * This function calculates the LEB numbers for the LEB properties it contains
+ * based on the pnode number.
+ */
+static void set_pnode_lnum(struct ubifs_info *c, struct ubifs_pnode *pnode)
+{
+ int i, lnum;
+
+ lnum = (pnode->num << UBIFS_LPT_FANOUT_SHIFT) + c->main_first;
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ if (lnum >= c->leb_cnt)
+ return;
+ pnode->lprops[i].lnum = lnum++;
+ }
+}
+
+/**
+ * ubifs_read_nnode - read a nnode from flash and link it to the tree in memory.
+ * @c: UBIFS file-system description object
+ * @parent: parent nnode (or NULL for the root)
+ * @iip: index in parent
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_read_nnode(struct ubifs_info *c, struct ubifs_nnode *parent, int iip)
+{
+ struct ubifs_nbranch *branch = NULL;
+ struct ubifs_nnode *nnode = NULL;
+ void *buf = c->lpt_nod_buf;
+ int err, lnum, offs;
+
+ if (parent) {
+ branch = &parent->nbranch[iip];
+ lnum = branch->lnum;
+ offs = branch->offs;
+ } else {
+ lnum = c->lpt_lnum;
+ offs = c->lpt_offs;
+ }
+ nnode = kzalloc(sizeof(struct ubifs_nnode), GFP_NOFS);
+ if (!nnode) {
+ err = -ENOMEM;
+ goto out;
+ }
+ if (lnum == 0) {
+ /*
+ * This nnode was not written which just means that the LEB
+ * properties in the subtree below it describe empty LEBs. We
+ * make the nnode as though we had read it, which in fact means
+ * doing almost nothing.
+ */
+ if (c->big_lpt)
+ nnode->num = calc_nnode_num_from_parent(c, parent, iip);
+ } else {
+ err = ubi_read(c->ubi, lnum, buf, offs, c->nnode_sz);
+ if (err)
+ goto out;
+ err = unpack_nnode(c, buf, nnode);
+ if (err)
+ goto out;
+ }
+ err = validate_nnode(c, nnode, parent, iip);
+ if (err)
+ goto out;
+ if (!c->big_lpt)
+ nnode->num = calc_nnode_num_from_parent(c, parent, iip);
+ if (parent) {
+ branch->nnode = nnode;
+ nnode->level = parent->level - 1;
+ } else {
+ c->nroot = nnode;
+ nnode->level = c->lpt_hght;
+ }
+ nnode->parent = parent;
+ nnode->iip = iip;
+ return 0;
+
+out:
+ ubifs_err("error %d reading nnode at %d:%d", err, lnum, offs);
+ kfree(nnode);
+ return err;
+}
+
+/**
+ * read_pnode - read a pnode from flash and link it to the tree in memory.
+ * @c: UBIFS file-system description object
+ * @parent: parent nnode
+ * @iip: index in parent
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int read_pnode(struct ubifs_info *c, struct ubifs_nnode *parent, int iip)
+{
+ struct ubifs_nbranch *branch;
+ struct ubifs_pnode *pnode = NULL;
+ void *buf = c->lpt_nod_buf;
+ int err, lnum, offs;
+
+ branch = &parent->nbranch[iip];
+ lnum = branch->lnum;
+ offs = branch->offs;
+ pnode = kzalloc(sizeof(struct ubifs_pnode), GFP_NOFS);
+ if (!pnode) {
+ err = -ENOMEM;
+ goto out;
+ }
+ if (lnum == 0) {
+ /*
+ * This pnode was not written which just means that the LEB
+ * properties in it describe empty LEBs. We make the pnode as
+ * though we had read it.
+ */
+ int i;
+
+ if (c->big_lpt)
+ pnode->num = calc_pnode_num_from_parent(c, parent, iip);
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ struct ubifs_lprops * const lprops = &pnode->lprops[i];
+
+ lprops->free = c->leb_size;
+ lprops->flags = ubifs_categorize_lprops(c, lprops);
+ }
+ } else {
+ err = ubi_read(c->ubi, lnum, buf, offs, c->pnode_sz);
+ if (err)
+ goto out;
+ err = unpack_pnode(c, buf, pnode);
+ if (err)
+ goto out;
+ }
+ err = validate_pnode(c, pnode, parent, iip);
+ if (err)
+ goto out;
+ if (!c->big_lpt)
+ pnode->num = calc_pnode_num_from_parent(c, parent, iip);
+ branch->pnode = pnode;
+ pnode->parent = parent;
+ pnode->iip = iip;
+ set_pnode_lnum(c, pnode);
+ c->pnodes_have += 1;
+ return 0;
+
+out:
+ ubifs_err("error %d reading pnode at %d:%d", err, lnum, offs);
+ dbg_dump_pnode(c, pnode, parent, iip);
+ dbg_msg("calc num: %d", calc_pnode_num_from_parent(c, parent, iip));
+ kfree(pnode);
+ return err;
+}
+
+/**
+ * read_ltab - read LPT's own lprops table.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int read_ltab(struct ubifs_info *c)
+{
+ int err;
+ void *buf;
+
+ buf = vmalloc(c->ltab_sz);
+ if (!buf)
+ return -ENOMEM;
+ err = ubi_read(c->ubi, c->ltab_lnum, buf, c->ltab_offs, c->ltab_sz);
+ if (err)
+ goto out;
+ err = unpack_ltab(c, buf);
+out:
+ vfree(buf);
+ return err;
+}
+
+/**
+ * read_lsave - read LPT's save table.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int read_lsave(struct ubifs_info *c)
+{
+ int err, i;
+ void *buf;
+
+ buf = vmalloc(c->lsave_sz);
+ if (!buf)
+ return -ENOMEM;
+ err = ubi_read(c->ubi, c->lsave_lnum, buf, c->lsave_offs, c->lsave_sz);
+ if (err)
+ goto out;
+ err = unpack_lsave(c, buf);
+ if (err)
+ goto out;
+ for (i = 0; i < c->lsave_cnt; i++) {
+ int lnum = c->lsave[i];
+
+ /*
+ * Due to automatic resizing, the values in the lsave table
+ * could be beyond the volume size - just ignore them.
+ */
+ if (lnum >= c->leb_cnt)
+ continue;
+ ubifs_lpt_lookup(c, lnum);
+ }
+out:
+ vfree(buf);
+ return err;
+}
+
+/**
+ * ubifs_get_nnode - get a nnode.
+ * @c: UBIFS file-system description object
+ * @parent: parent nnode (or NULL for the root)
+ * @iip: index in parent
+ *
+ * This function returns a pointer to the nnode on success or a negative error
+ * code on failure.
+ */
+struct ubifs_nnode *ubifs_get_nnode(struct ubifs_info *c,
+ struct ubifs_nnode *parent, int iip)
+{
+ struct ubifs_nbranch *branch;
+ struct ubifs_nnode *nnode;
+ int err;
+
+ branch = &parent->nbranch[iip];
+ nnode = branch->nnode;
+ if (nnode)
+ return nnode;
+ err = ubifs_read_nnode(c, parent, iip);
+ if (err)
+ return ERR_PTR(err);
+ return branch->nnode;
+}
+
+/**
+ * ubifs_get_pnode - get a pnode.
+ * @c: UBIFS file-system description object
+ * @parent: parent nnode
+ * @iip: index in parent
+ *
+ * This function returns a pointer to the pnode on success or a negative error
+ * code on failure.
+ */
+struct ubifs_pnode *ubifs_get_pnode(struct ubifs_info *c,
+ struct ubifs_nnode *parent, int iip)
+{
+ struct ubifs_nbranch *branch;
+ struct ubifs_pnode *pnode;
+ int err;
+
+ branch = &parent->nbranch[iip];
+ pnode = branch->pnode;
+ if (pnode)
+ return pnode;
+ err = read_pnode(c, parent, iip);
+ if (err)
+ return ERR_PTR(err);
+ update_cats(c, branch->pnode);
+ return branch->pnode;
+}
+
+/**
+ * ubifs_lpt_lookup - lookup LEB properties in the LPT.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number to lookup
+ *
+ * This function returns a pointer to the LEB properties on success or a
+ * negative error code on failure.
+ */
+struct ubifs_lprops *ubifs_lpt_lookup(struct ubifs_info *c, int lnum)
+{
+ int err, i, h, iip, shft;
+ struct ubifs_nnode *nnode;
+ struct ubifs_pnode *pnode;
+
+ if (!c->nroot) {
+ err = ubifs_read_nnode(c, NULL, 0);
+ if (err)
+ return ERR_PTR(err);
+ }
+ nnode = c->nroot;
+ i = lnum - c->main_first;
+ shft = c->lpt_hght * UBIFS_LPT_FANOUT_SHIFT;
+ for (h = 1; h < c->lpt_hght; h++) {
+ iip = ((i >> shft) & (UBIFS_LPT_FANOUT - 1));
+ shft -= UBIFS_LPT_FANOUT_SHIFT;
+ nnode = ubifs_get_nnode(c, nnode, iip);
+ if (IS_ERR(nnode))
+ return ERR_PTR(PTR_ERR(nnode));
+ }
+ iip = ((i >> shft) & (UBIFS_LPT_FANOUT - 1));
+ shft -= UBIFS_LPT_FANOUT_SHIFT;
+ pnode = ubifs_get_pnode(c, nnode, iip);
+ if (IS_ERR(pnode))
+ return ERR_PTR(PTR_ERR(pnode));
+ iip = (i & (UBIFS_LPT_FANOUT - 1));
+ dbg_lp("LEB %d, free %d, dirty %d, flags %d", lnum,
+ pnode->lprops[iip].free, pnode->lprops[iip].dirty,
+ pnode->lprops[iip].flags);
+ return &pnode->lprops[iip];
+}
+
+/**
+ * dirty_cow_nnode - ensure a nnode is not being committed.
+ * @c: UBIFS file-system description object
+ * @nnode: nnode to check
+ *
+ * Returns dirtied nnode on success or negative error code on failure.
+ */
+static struct ubifs_nnode *dirty_cow_nnode(struct ubifs_info *c,
+ struct ubifs_nnode *nnode)
+{
+ struct ubifs_nnode *n;
+ int i;
+
+ if (!test_bit(COW_ZNODE, &nnode->flags)) {
+ /* nnode is not being committed */
+ if (!test_and_set_bit(DIRTY_CNODE, &nnode->flags)) {
+ c->dirty_nn_cnt += 1;
+ ubifs_add_nnode_dirt(c, nnode);
+ }
+ return nnode;
+ }
+
+ /* nnode is being committed, so copy it */
+ n = kzalloc(sizeof(struct ubifs_nnode), GFP_NOFS);
+ if (!n)
+ return ERR_PTR(-ENOMEM);
+
+ memcpy(n, nnode, sizeof(struct ubifs_nnode));
+
+ /* The children now have new parent */
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ struct ubifs_nbranch *branch = &n->nbranch[i];
+
+ if (branch->cnode)
+ branch->cnode->parent = n;
+ }
+
+ ubifs_assert(!test_bit(OBSOLETE_CNODE, &nnode->flags));
+ set_bit(OBSOLETE_CNODE, &nnode->flags);
+
+ n->cnext = NULL;
+ set_bit(DIRTY_CNODE, &n->flags);
+ clear_bit(COW_CNODE, &n->flags);
+ c->dirty_nn_cnt += 1;
+ ubifs_add_nnode_dirt(c, nnode);
+ if (nnode->parent)
+ nnode->parent->nbranch[n->iip].nnode = n;
+ else
+ c->nroot = n;
+
+ return n;
+}
+
+/**
+ * dirty_cow_pnode - ensure a pnode is not being committed.
+ * @c: UBIFS file-system description object
+ * @pnode: pnode to check
+ *
+ * Returns dirtied pnode on success or negative error code on failure.
+ */
+static struct ubifs_pnode *dirty_cow_pnode(struct ubifs_info *c,
+ struct ubifs_pnode *pnode)
+{
+ struct ubifs_pnode *p;
+
+ if (!test_bit(COW_ZNODE, &pnode->flags)) {
+ /* pnode is not being committed */
+ if (!test_and_set_bit(DIRTY_CNODE, &pnode->flags)) {
+ c->dirty_pn_cnt += 1;
+ add_pnode_dirt(c, pnode);
+ }
+ return pnode;
+ }
+
+ /* pnode is being committed, so copy it */
+ p = kzalloc(sizeof(struct ubifs_pnode), GFP_NOFS);
+ if (!p)
+ return ERR_PTR(-ENOMEM);
+
+ memcpy(p, pnode, sizeof(struct ubifs_pnode));
+ replace_cats(c, pnode, p);
+
+ ubifs_assert(!test_bit(OBSOLETE_CNODE, &pnode->flags));
+ set_bit(OBSOLETE_CNODE, &pnode->flags);
+
+ p->cnext = NULL;
+ set_bit(DIRTY_CNODE, &p->flags);
+ clear_bit(COW_CNODE, &p->flags);
+ c->dirty_pn_cnt += 1;
+ add_pnode_dirt(c, pnode);
+ pnode->parent->nbranch[p->iip].pnode = p;
+
+ return p;
+}
+
+/**
+ * ubifs_lpt_lookup_dirty - lookup LEB properties in the LPT.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number to lookup
+ *
+ * This function returns a pointer to the LEB properties on success or a
+ * negative error code on failure.
+ */
+struct ubifs_lprops *ubifs_lpt_lookup_dirty(struct ubifs_info *c, int lnum)
+{
+ int err, i, h, iip, shft;
+ struct ubifs_nnode *nnode;
+ struct ubifs_pnode *pnode;
+
+ if (!c->nroot) {
+ err = ubifs_read_nnode(c, NULL, 0);
+ if (err)
+ return ERR_PTR(err);
+ }
+ nnode = c->nroot;
+ nnode = dirty_cow_nnode(c, nnode);
+ if (IS_ERR(nnode))
+ return ERR_PTR(PTR_ERR(nnode));
+ i = lnum - c->main_first;
+ shft = c->lpt_hght * UBIFS_LPT_FANOUT_SHIFT;
+ for (h = 1; h < c->lpt_hght; h++) {
+ iip = ((i >> shft) & (UBIFS_LPT_FANOUT - 1));
+ shft -= UBIFS_LPT_FANOUT_SHIFT;
+ nnode = ubifs_get_nnode(c, nnode, iip);
+ if (IS_ERR(nnode))
+ return ERR_PTR(PTR_ERR(nnode));
+ nnode = dirty_cow_nnode(c, nnode);
+ if (IS_ERR(nnode))
+ return ERR_PTR(PTR_ERR(nnode));
+ }
+ iip = ((i >> shft) & (UBIFS_LPT_FANOUT - 1));
+ shft -= UBIFS_LPT_FANOUT_SHIFT;
+ pnode = ubifs_get_pnode(c, nnode, iip);
+ if (IS_ERR(pnode))
+ return ERR_PTR(PTR_ERR(pnode));
+ pnode = dirty_cow_pnode(c, pnode);
+ if (IS_ERR(pnode))
+ return ERR_PTR(PTR_ERR(pnode));
+ iip = (i & (UBIFS_LPT_FANOUT - 1));
+ dbg_lp("LEB %d, free %d, dirty %d, flags %d", lnum,
+ pnode->lprops[iip].free, pnode->lprops[iip].dirty,
+ pnode->lprops[iip].flags);
+ ubifs_assert(test_bit(DIRTY_CNODE, &pnode->flags));
+ return &pnode->lprops[iip];
+}
+
+/**
+ * lpt_init_rd - initialize the LPT for reading.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int lpt_init_rd(struct ubifs_info *c)
+{
+ int err, i;
+
+ c->ltab = vmalloc(sizeof(struct ubifs_lpt_lprops) * c->lpt_lebs);
+ if (!c->ltab)
+ return -ENOMEM;
+
+ i = max_t(int, c->nnode_sz, c->pnode_sz);
+ c->lpt_nod_buf = kmalloc(i, GFP_KERNEL);
+ if (!c->lpt_nod_buf)
+ return -ENOMEM;
+
+ for (i = 0; i < LPROPS_HEAP_CNT; i++) {
+ c->lpt_heap[i].arr = kmalloc(sizeof(void *) * LPT_HEAP_SZ,
+ GFP_KERNEL);
+ if (!c->lpt_heap[i].arr)
+ return -ENOMEM;
+ c->lpt_heap[i].cnt = 0;
+ c->lpt_heap[i].max_cnt = LPT_HEAP_SZ;
+ }
+
+ c->dirty_idx.arr = kmalloc(sizeof(void *) * LPT_HEAP_SZ, GFP_KERNEL);
+ if (!c->dirty_idx.arr)
+ return -ENOMEM;
+ c->dirty_idx.cnt = 0;
+ c->dirty_idx.max_cnt = LPT_HEAP_SZ;
+
+ err = read_ltab(c);
+ if (err)
+ return err;
+
+ dbg_lp("space_bits %d", c->space_bits);
+ dbg_lp("lpt_lnum_bits %d", c->lpt_lnum_bits);
+ dbg_lp("lpt_offs_bits %d", c->lpt_offs_bits);
+ dbg_lp("lpt_spc_bits %d", c->lpt_spc_bits);
+ dbg_lp("pcnt_bits %d", c->pcnt_bits);
+ dbg_lp("lnum_bits %d", c->lnum_bits);
+ dbg_lp("pnode_sz %d", c->pnode_sz);
+ dbg_lp("nnode_sz %d", c->nnode_sz);
+ dbg_lp("ltab_sz %d", c->ltab_sz);
+ dbg_lp("lsave_sz %d", c->lsave_sz);
+ dbg_lp("lsave_cnt %d", c->lsave_cnt);
+ dbg_lp("lpt_hght %d", c->lpt_hght);
+ dbg_lp("big_lpt %d", c->big_lpt);
+ dbg_lp("LPT root is at %d:%d", c->lpt_lnum, c->lpt_offs);
+ dbg_lp("LPT head is at %d:%d", c->nhead_lnum, c->nhead_offs);
+ dbg_lp("LPT ltab is at %d:%d", c->ltab_lnum, c->ltab_offs);
+ if (c->big_lpt)
+ dbg_lp("LPT lsave is at %d:%d", c->lsave_lnum, c->lsave_offs);
+
+ return 0;
+}
+
+/**
+ * lpt_init_wr - initialize the LPT for writing.
+ * @c: UBIFS file-system description object
+ *
+ * 'lpt_init_rd()' must have been called already.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int lpt_init_wr(struct ubifs_info *c)
+{
+ int err, i;
+
+ c->ltab_cmt = vmalloc(sizeof(struct ubifs_lpt_lprops) * c->lpt_lebs);
+ if (!c->ltab_cmt)
+ return -ENOMEM;
+
+ c->lpt_buf = vmalloc(c->leb_size);
+ if (!c->lpt_buf)
+ return -ENOMEM;
+
+ if (c->big_lpt) {
+ c->lsave = kmalloc(sizeof(int) * c->lsave_cnt, GFP_NOFS);
+ if (!c->lsave)
+ return -ENOMEM;
+ err = read_lsave(c);
+ if (err)
+ return err;
+ }
+
+ for (i = 0; i < c->lpt_lebs; i++)
+ if (c->ltab[i].free == c->leb_size) {
+ err = ubifs_leb_unmap(c, i + c->lpt_first);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+/**
+ * ubifs_lpt_init - initialize the LPT.
+ * @c: UBIFS file-system description object
+ * @rd: whether to initialize lpt for reading
+ * @wr: whether to initialize lpt for writing
+ *
+ * For mounting 'rw', @rd and @wr are both true. For mounting 'ro', @rd is true
+ * and @wr is false. For mounting from 'ro' to 'rw', @rd is false and @wr is
+ * true.
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_lpt_init(struct ubifs_info *c, int rd, int wr)
+{
+ int err;
+
+ if (rd) {
+ err = lpt_init_rd(c);
+ if (err)
+ return err;
+ }
+
+ if (wr) {
+ err = lpt_init_wr(c);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+/**
+ * struct lpt_scan_node - somewhere to put nodes while we scan LPT.
+ * @nnode: where to keep a nnode
+ * @pnode: where to keep a pnode
+ * @cnode: where to keep a cnode
+ * @in_tree: is the node in the tree in memory
+ * @ptr.nnode: pointer to the nnode (if it is an nnode) which may be here or in
+ * the tree
+ * @ptr.pnode: ditto for pnode
+ * @ptr.cnode: ditto for cnode
+ */
+struct lpt_scan_node {
+ union {
+ struct ubifs_nnode nnode;
+ struct ubifs_pnode pnode;
+ struct ubifs_cnode cnode;
+ };
+ int in_tree;
+ union {
+ struct ubifs_nnode *nnode;
+ struct ubifs_pnode *pnode;
+ struct ubifs_cnode *cnode;
+ } ptr;
+};
+
+/**
+ * scan_get_nnode - for the scan, get a nnode from either the tree or flash.
+ * @c: the UBIFS file-system description object
+ * @path: where to put the nnode
+ * @parent: parent of the nnode
+ * @iip: index in parent of the nnode
+ *
+ * This function returns a pointer to the nnode on success or a negative error
+ * code on failure.
+ */
+static struct ubifs_nnode *scan_get_nnode(struct ubifs_info *c,
+ struct lpt_scan_node *path,
+ struct ubifs_nnode *parent, int iip)
+{
+ struct ubifs_nbranch *branch;
+ struct ubifs_nnode *nnode;
+ void *buf = c->lpt_nod_buf;
+ int err;
+
+ branch = &parent->nbranch[iip];
+ nnode = branch->nnode;
+ if (nnode) {
+ path->in_tree = 1;
+ path->ptr.nnode = nnode;
+ return nnode;
+ }
+ nnode = &path->nnode;
+ path->in_tree = 0;
+ path->ptr.nnode = nnode;
+ memset(nnode, 0, sizeof(struct ubifs_nnode));
+ if (branch->lnum == 0) {
+ /*
+ * This nnode was not written which just means that the LEB
+ * properties in the subtree below it describe empty LEBs. We
+ * make the nnode as though we had read it, which in fact means
+ * doing almost nothing.
+ */
+ if (c->big_lpt)
+ nnode->num = calc_nnode_num_from_parent(c, parent, iip);
+ } else {
+ err = ubi_read(c->ubi, branch->lnum, buf, branch->offs,
+ c->nnode_sz);
+ if (err)
+ return ERR_PTR(err);
+ err = unpack_nnode(c, buf, nnode);
+ if (err)
+ return ERR_PTR(err);
+ }
+ err = validate_nnode(c, nnode, parent, iip);
+ if (err)
+ return ERR_PTR(err);
+ if (!c->big_lpt)
+ nnode->num = calc_nnode_num_from_parent(c, parent, iip);
+ nnode->level = parent->level - 1;
+ nnode->parent = parent;
+ nnode->iip = iip;
+ return nnode;
+}
+
+/**
+ * scan_get_pnode - for the scan, get a pnode from either the tree or flash.
+ * @c: the UBIFS file-system description object
+ * @path: where to put the pnode
+ * @parent: parent of the pnode
+ * @iip: index in parent of the pnode
+ *
+ * This function returns a pointer to the pnode on success or a negative error
+ * code on failure.
+ */
+static struct ubifs_pnode *scan_get_pnode(struct ubifs_info *c,
+ struct lpt_scan_node *path,
+ struct ubifs_nnode *parent, int iip)
+{
+ struct ubifs_nbranch *branch;
+ struct ubifs_pnode *pnode;
+ void *buf = c->lpt_nod_buf;
+ int err;
+
+ branch = &parent->nbranch[iip];
+ pnode = branch->pnode;
+ if (pnode) {
+ path->in_tree = 1;
+ path->ptr.pnode = pnode;
+ return pnode;
+ }
+ pnode = &path->pnode;
+ path->in_tree = 0;
+ path->ptr.pnode = pnode;
+ memset(pnode, 0, sizeof(struct ubifs_pnode));
+ if (branch->lnum == 0) {
+ /*
+ * This pnode was not written which just means that the LEB
+ * properties in it describe empty LEBs. We make the pnode as
+ * though we had read it.
+ */
+ int i;
+
+ if (c->big_lpt)
+ pnode->num = calc_pnode_num_from_parent(c, parent, iip);
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ struct ubifs_lprops * const lprops = &pnode->lprops[i];
+
+ lprops->free = c->leb_size;
+ lprops->flags = ubifs_categorize_lprops(c, lprops);
+ }
+ } else {
+ ubifs_assert(branch->lnum >= c->lpt_first &&
+ branch->lnum <= c->lpt_last);
+ ubifs_assert(branch->offs >= 0 && branch->offs < c->leb_size);
+ err = ubi_read(c->ubi, branch->lnum, buf, branch->offs,
+ c->pnode_sz);
+ if (err)
+ return ERR_PTR(err);
+ err = unpack_pnode(c, buf, pnode);
+ if (err)
+ return ERR_PTR(err);
+ }
+ err = validate_pnode(c, pnode, parent, iip);
+ if (err)
+ return ERR_PTR(err);
+ if (!c->big_lpt)
+ pnode->num = calc_pnode_num_from_parent(c, parent, iip);
+ pnode->parent = parent;
+ pnode->iip = iip;
+ set_pnode_lnum(c, pnode);
+ return pnode;
+}
+
+/**
+ * ubifs_lpt_scan_nolock - scan the LPT.
+ * @c: the UBIFS file-system description object
+ * @start_lnum: LEB number from which to start scanning
+ * @end_lnum: LEB number at which to stop scanning
+ * @scan_cb: callback function called for each lprops
+ * @data: data to be passed to the callback function
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_lpt_scan_nolock(struct ubifs_info *c, int start_lnum, int end_lnum,
+ ubifs_lpt_scan_callback scan_cb, void *data)
+{
+ int err = 0, i, h, iip, shft;
+ struct ubifs_nnode *nnode;
+ struct ubifs_pnode *pnode;
+ struct lpt_scan_node *path;
+
+ if (start_lnum == -1) {
+ start_lnum = end_lnum + 1;
+ if (start_lnum >= c->leb_cnt)
+ start_lnum = c->main_first;
+ }
+
+ ubifs_assert(start_lnum >= c->main_first && start_lnum < c->leb_cnt);
+ ubifs_assert(end_lnum >= c->main_first && end_lnum < c->leb_cnt);
+
+ if (!c->nroot) {
+ err = ubifs_read_nnode(c, NULL, 0);
+ if (err)
+ return err;
+ }
+
+ path = kmalloc(sizeof(struct lpt_scan_node) * (c->lpt_hght + 1),
+ GFP_NOFS);
+ if (!path)
+ return -ENOMEM;
+
+ path[0].ptr.nnode = c->nroot;
+ path[0].in_tree = 1;
+again:
+ /* Descend to the pnode containing start_lnum */
+ nnode = c->nroot;
+ i = start_lnum - c->main_first;
+ shft = c->lpt_hght * UBIFS_LPT_FANOUT_SHIFT;
+ for (h = 1; h < c->lpt_hght; h++) {
+ iip = ((i >> shft) & (UBIFS_LPT_FANOUT - 1));
+ shft -= UBIFS_LPT_FANOUT_SHIFT;
+ nnode = scan_get_nnode(c, path + h, nnode, iip);
+ if (IS_ERR(nnode)) {
+ err = PTR_ERR(nnode);
+ goto out;
+ }
+ }
+ iip = ((i >> shft) & (UBIFS_LPT_FANOUT - 1));
+ shft -= UBIFS_LPT_FANOUT_SHIFT;
+ pnode = scan_get_pnode(c, path + h, nnode, iip);
+ if (IS_ERR(pnode)) {
+ err = PTR_ERR(pnode);
+ goto out;
+ }
+ iip = (i & (UBIFS_LPT_FANOUT - 1));
+
+ /* Loop for each lprops */
+ while (1) {
+ struct ubifs_lprops *lprops = &pnode->lprops[iip];
+ int ret, lnum = lprops->lnum;
+
+ ret = scan_cb(c, lprops, path[h].in_tree, data);
+ if (ret < 0) {
+ err = ret;
+ goto out;
+ }
+ if (ret & LPT_SCAN_ADD) {
+ /* Add all the nodes in path to the tree in memory */
+ for (h = 1; h < c->lpt_hght; h++) {
+ const size_t sz = sizeof(struct ubifs_nnode);
+ struct ubifs_nnode *parent;
+
+ if (path[h].in_tree)
+ continue;
+ nnode = kmalloc(sz, GFP_NOFS);
+ if (!nnode) {
+ err = -ENOMEM;
+ goto out;
+ }
+ memcpy(nnode, &path[h].nnode, sz);
+ parent = nnode->parent;
+ parent->nbranch[nnode->iip].nnode = nnode;
+ path[h].ptr.nnode = nnode;
+ path[h].in_tree = 1;
+ path[h + 1].cnode.parent = nnode;
+ }
+ if (path[h].in_tree)
+ ubifs_ensure_cat(c, lprops);
+ else {
+ const size_t sz = sizeof(struct ubifs_pnode);
+ struct ubifs_nnode *parent;
+
+ pnode = kmalloc(sz, GFP_NOFS);
+ if (!pnode) {
+ err = -ENOMEM;
+ goto out;
+ }
+ memcpy(pnode, &path[h].pnode, sz);
+ parent = pnode->parent;
+ parent->nbranch[pnode->iip].pnode = pnode;
+ path[h].ptr.pnode = pnode;
+ path[h].in_tree = 1;
+ update_cats(c, pnode);
+ c->pnodes_have += 1;
+ }
+ err = dbg_check_lpt_nodes(c, (struct ubifs_cnode *)
+ c->nroot, 0, 0);
+ if (err)
+ return err;
+ err = dbg_check_cats(c);
+ if (err)
+ goto out;
+ }
+ if (ret & LPT_SCAN_STOP) {
+ err = 0;
+ break;
+ }
+ /* Get the next lprops */
+ if (lnum == end_lnum) {
+ /*
+ * We got to the end without finding what we were
+ * looking for
+ */
+ err = -ENOSPC;
+ goto out;
+ }
+ if (lnum + 1 >= c->leb_cnt) {
+ /* Wrap-around to the beginning */
+ start_lnum = c->main_first;
+ goto again;
+ }
+ if (iip + 1 < UBIFS_LPT_FANOUT) {
+ /* Next lprops is in the same pnode */
+ iip += 1;
+ continue;
+ }
+ /* We need to get the next pnode. Go up until we can go right */
+ iip = pnode->iip;
+ while (1) {
+ h -= 1;
+ ubifs_assert(h >= 0);
+ nnode = path[h].ptr.nnode;
+ if (iip + 1 < UBIFS_LPT_FANOUT)
+ break;
+ iip = nnode->iip;
+ }
+ /* Go right */
+ iip += 1;
+ /* Descend to the pnode */
+ h += 1;
+ for (; h < c->lpt_hght; h++) {
+ nnode = scan_get_nnode(c, path + h, nnode, iip);
+ if (IS_ERR(nnode)) {
+ err = PTR_ERR(nnode);
+ goto out;
+ }
+ iip = 0;
+ }
+ pnode = scan_get_pnode(c, path + h, nnode, iip);
+ if (IS_ERR(pnode)) {
+ err = PTR_ERR(pnode);
+ goto out;
+ }
+ iip = 0;
+ }
+out:
+ kfree(path);
+ return err;
+}
+
+#if defined(CONFIG_UBIFS_FS_DEBUG_CHK_LPROPS)
+
+/**
+ * dbg_chk_pnode - check a pnode.
+ * @c: the UBIFS file-system description object
+ * @pnode: pnode to check
+ * @col: pnode column
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+static int dbg_chk_pnode(struct ubifs_info *c, struct ubifs_pnode *pnode,
+ int col)
+{
+ int i;
+
+ if (pnode->num != col) {
+ dbg_err("pnode num %d expected %d parent num %d iip %d",
+ pnode->num, col, pnode->parent->num, pnode->iip);
+ return -EINVAL;
+ }
+ for (i = 0; i < UBIFS_LPT_FANOUT; i++) {
+ struct ubifs_lprops *lp, *lprops = &pnode->lprops[i];
+ int lnum = (pnode->num << UBIFS_LPT_FANOUT_SHIFT) + i +
+ c->main_first;
+ int found, cat = lprops->flags & LPROPS_CAT_MASK;
+ struct ubifs_lpt_heap *heap;
+ struct list_head *list = NULL;
+
+ if (lnum >= c->leb_cnt)
+ continue;
+ if (lprops->lnum != lnum) {
+ dbg_err("bad LEB number %d expected %d",
+ lprops->lnum, lnum);
+ return -EINVAL;
+ }
+ if (lprops->flags & LPROPS_TAKEN) {
+ if (cat != LPROPS_UNCAT) {
+ dbg_err("LEB %d taken but not uncat %d",
+ lprops->lnum, cat);
+ return -EINVAL;
+ }
+ continue;
+ }
+ if (lprops->flags & LPROPS_INDEX) {
+ switch (cat) {
+ case LPROPS_UNCAT:
+ case LPROPS_DIRTY_IDX:
+ case LPROPS_FRDI_IDX:
+ break;
+ default:
+ dbg_err("LEB %d index but cat %d",
+ lprops->lnum, cat);
+ return -EINVAL;
+ }
+ } else {
+ switch (cat) {
+ case LPROPS_UNCAT:
+ case LPROPS_DIRTY:
+ case LPROPS_FREE:
+ case LPROPS_EMPTY:
+ case LPROPS_FREEABLE:
+ break;
+ default:
+ dbg_err("LEB %d not index but cat %d",
+ lprops->lnum, cat);
+ return -EINVAL;
+ }
+ }
+ switch (cat) {
+ case LPROPS_UNCAT:
+ list = &c->uncat_list;
+ break;
+ case LPROPS_EMPTY:
+ list = &c->empty_list;
+ break;
+ case LPROPS_FREEABLE:
+ list = &c->freeable_list;
+ break;
+ case LPROPS_FRDI_IDX:
+ list = &c->frdi_idx_list;
+ break;
+ }
+ found = 0;
+ switch (cat) {
+ case LPROPS_DIRTY:
+ case LPROPS_DIRTY_IDX:
+ case LPROPS_FREE:
+ heap = &c->lpt_heap[cat - 1];
+ if (lprops->hpos < heap->cnt &&
+ heap->arr[lprops->hpos] == lprops)
+ found = 1;
+ break;
+ case LPROPS_UNCAT:
+ case LPROPS_EMPTY:
+ case LPROPS_FREEABLE:
+ case LPROPS_FRDI_IDX:
+ list_for_each_entry(lp, list, list)
+ if (lprops == lp) {
+ found = 1;
+ break;
+ }
+ break;
+ }
+ if (!found) {
+ dbg_err("LEB %d cat %d not found in cat heap/list",
+ lprops->lnum, cat);
+ return -EINVAL;
+ }
+ switch (cat) {
+ case LPROPS_EMPTY:
+ if (lprops->free != c->leb_size) {
+ dbg_err("LEB %d cat %d free %d dirty %d",
+ lprops->lnum, cat, lprops->free,
+ lprops->dirty);
+ return -EINVAL;
+ }
+ case LPROPS_FREEABLE:
+ case LPROPS_FRDI_IDX:
+ if (lprops->free + lprops->dirty != c->leb_size) {
+ dbg_err("LEB %d cat %d free %d dirty %d",
+ lprops->lnum, cat, lprops->free,
+ lprops->dirty);
+ return -EINVAL;
+ }
+ }
+ }
+ return 0;
+}
+
+/**
+ * dbg_check_lpt_nodes - check nnodes and pnodes.
+ * @c: the UBIFS file-system description object
+ * @cnode: next cnode (nnode or pnode) to check
+ * @row: row of cnode (root is zero)
+ * @col: column of cnode (leftmost is zero)
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int dbg_check_lpt_nodes(struct ubifs_info *c, struct ubifs_cnode *cnode,
+ int row, int col)
+{
+ struct ubifs_nnode *nnode, *nn;
+ struct ubifs_cnode *cn;
+ int num, iip = 0, err;
+
+ while (cnode) {
+ ubifs_assert(row >= 0);
+ nnode = cnode->parent;
+ if (cnode->level) {
+ /* cnode is a nnode */
+ num = calc_nnode_num(row, col);
+ if (cnode->num != num) {
+ dbg_err("nnode num %d expected %d "
+ "parent num %d iip %d", cnode->num, num,
+ (nnode ? nnode->num : 0), cnode->iip);
+ return -EINVAL;
+ }
+ nn = (struct ubifs_nnode *)cnode;
+ while (iip < UBIFS_LPT_FANOUT) {
+ cn = nn->nbranch[iip].cnode;
+ if (cn) {
+ /* Go down */
+ row += 1;
+ col <<= UBIFS_LPT_FANOUT_SHIFT;
+ col += iip;
+ iip = 0;
+ cnode = cn;
+ break;
+ }
+ /* Go right */
+ iip += 1;
+ }
+ if (iip < UBIFS_LPT_FANOUT)
+ continue;
+ } else {
+ struct ubifs_pnode *pnode;
+
+ /* cnode is a pnode */
+ pnode = (struct ubifs_pnode *)cnode;
+ err = dbg_chk_pnode(c, pnode, col);
+ if (err)
+ return err;
+ }
+ /* Go up and to the right */
+ row -= 1;
+ col >>= UBIFS_LPT_FANOUT_SHIFT;
+ iip = cnode->iip + 1;
+ cnode = (struct ubifs_cnode *)nnode;
+ }
+ return 0;
+}
+
+#endif /* CONFIG_UBIFS_FS_DEBUG_CHK_LPROPS */
--
1.5.4.1

2008-03-27 13:15:28

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 18/26] UBIFS: add LEB find subsystem

The LEB find sub-system is responsible for maintaining lists of
eraseblocks with free and dirty space. For example, when UBIFS has
to do garbage collection, in needs to find the dirtiest eraseblock,
because it is faster to garbage-collect it, and it asks the
LEB find sub-system to do this, which usually has immediate answer.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/find.c | 951 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 951 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/find.c b/fs/ubifs/find.c
new file mode 100644
index 0000000..fc601e5
--- /dev/null
+++ b/fs/ubifs/find.c
@@ -0,0 +1,951 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file contains functions for finding LEBs for various purposes e.g.
+ * garbage collection. In general, lprops category heaps and lists are used
+ * for fast access, falling back on scanning the LPT as a last resort.
+ */
+
+#include <linux/sort.h>
+#include "ubifs.h"
+
+/**
+ * struct scan_data - data provided to scan callback functions
+ * @min_space: minimum number of bytes for which to scan
+ * @pick_free: whether it is OK to scan for empty LEBs
+ * @lnum: LEB number found is returned here
+ * @exclude_index: whether to exclude index LEBs
+ */
+struct scan_data {
+ int min_space;
+ int pick_free;
+ int lnum;
+ int exclude_index;
+};
+
+/**
+ * valuable - determine whether LEB properties are valuable.
+ * @c: the UBIFS file-system description object
+ * @lprops: LEB properties
+ *
+ * This function return %1 if the LEB properties should be added to the LEB
+ * properties tree in memory. Otherwise %0 is returned.
+ */
+static int valuable(struct ubifs_info *c, const struct ubifs_lprops *lprops)
+{
+ int n, cat = lprops->flags & LPROPS_CAT_MASK;
+ struct ubifs_lpt_heap *heap;
+
+ switch (cat) {
+ case LPROPS_DIRTY:
+ case LPROPS_DIRTY_IDX:
+ case LPROPS_FREE:
+ heap = &c->lpt_heap[cat - 1];
+ if (heap->cnt < heap->max_cnt)
+ return 1;
+ if (lprops->free + lprops->dirty >= c->dark_wm)
+ return 1;
+ return 0;
+ case LPROPS_EMPTY:
+ n = c->lst.empty_lebs + c->freeable_cnt -
+ c->lst.taken_empty_lebs;
+ if (n < c->lsave_cnt)
+ return 1;
+ return 0;
+ case LPROPS_FREEABLE:
+ return 1;
+ case LPROPS_FRDI_IDX:
+ return 1;
+ }
+ return 0;
+}
+
+/**
+ * scan_for_dirty_cb - dirty space scan callback.
+ * @c: the UBIFS file-system description object
+ * @lprops: LEB properties to scan
+ * @in_tree: whether the LEB properties are in main memory
+ * @data: information passed to and from the caller of the scan
+ *
+ * This function returns a code that indicates whether the scan should continue
+ * (%LPT_SCAN_CONTINUE), whether the LEB properties should be added to the tree
+ * in main memory (%LPT_SCAN_ADD), or whether the scan should stop
+ * (%LPT_SCAN_STOP).
+ */
+static int scan_for_dirty_cb(struct ubifs_info *c,
+ const struct ubifs_lprops *lprops, int in_tree,
+ struct scan_data *data)
+{
+ int ret = LPT_SCAN_CONTINUE;
+
+ /* Exclude LEBs that are currently in use */
+ if (lprops->flags & LPROPS_TAKEN)
+ return LPT_SCAN_CONTINUE;
+ /* Determine whether to add these LEB properties to the tree */
+ if (!in_tree && valuable(c, lprops))
+ ret |= LPT_SCAN_ADD;
+ /* Exclude LEBs with too little space */
+ if (lprops->free + lprops->dirty < data->min_space)
+ return ret;
+ /* If specified, exclude index LEBs */
+ if (data->exclude_index && lprops->flags & LPROPS_INDEX)
+ return ret;
+ /* If specified, exclude empty or freeable LEBs */
+ if (!data->pick_free && lprops->free + lprops->dirty == c->leb_size)
+ return ret;
+ /* Exclude LEBs with too little dirty space (unless it is empty) */
+ if (lprops->dirty < c->dead_wm && lprops->free != c->leb_size)
+ return ret;
+ /* Finally we found space */
+ data->lnum = lprops->lnum;
+ return LPT_SCAN_ADD | LPT_SCAN_STOP;
+}
+
+/**
+ * scan_for_dirty - find a data LEB with free space.
+ * @c: the UBIFS file-system description object
+ * @min_space: minimum amount free plus dirty space the returned LEB has to
+ * have
+ * @pick_free: if it is ok to return a free or freeable LEB
+ * @exclude_index: whether to exclude index LEBs
+ *
+ * This function returns a pointer to the LEB properties found or a negative
+ * error code.
+ */
+static const struct ubifs_lprops *scan_for_dirty(struct ubifs_info *c,
+ int min_space, int pick_free,
+ int exclude_index)
+{
+ const struct ubifs_lprops *lprops;
+ struct ubifs_lpt_heap *heap;
+ struct scan_data data;
+ int err, i;
+
+ /* There may be an LEB with enough dirty space on the free heap */
+ heap = &c->lpt_heap[LPROPS_FREE - 1];
+ for (i = 0; i < heap->cnt; i++) {
+ lprops = heap->arr[i];
+ if (lprops->free + lprops->dirty < min_space)
+ continue;
+ if (lprops->dirty < c->dead_wm)
+ continue;
+ return lprops;
+ }
+ /*
+ * A LEB may have fallen off of the bottom of the dirty heap, and ended
+ * up as uncategorized even though it has enough dirty space for us now,
+ * so check the uncategorized list. N.B. neither empty nor freeable LEBs
+ * can end up as uncategorized because they are kept on lists not
+ * finite-sized heaps.
+ */
+ list_for_each_entry(lprops, &c->uncat_list, list) {
+ if (lprops->flags & LPROPS_TAKEN)
+ continue;
+ if (lprops->free + lprops->dirty < min_space)
+ continue;
+ if (exclude_index && (lprops->flags & LPROPS_INDEX))
+ continue;
+ if (lprops->dirty < c->dead_wm)
+ continue;
+ return lprops;
+ }
+ /* We have looked everywhere in main memory, now scan the flash */
+ if (c->pnodes_have >= c->pnode_cnt)
+ /* All pnodes are in memory, so skip scan */
+ return ERR_PTR(-ENOSPC);
+ data.min_space = min_space;
+ data.pick_free = pick_free;
+ data.lnum = -1;
+ data.exclude_index = exclude_index;
+ err = ubifs_lpt_scan_nolock(c, -1, c->lscan_lnum,
+ (ubifs_lpt_scan_callback)scan_for_dirty_cb,
+ &data);
+ if (err)
+ return ERR_PTR(err);
+ ubifs_assert(data.lnum >= c->main_first && data.lnum < c->leb_cnt);
+ c->lscan_lnum = data.lnum;
+ lprops = ubifs_lpt_lookup_dirty(c, data.lnum);
+ if (IS_ERR(lprops))
+ return lprops;
+ ubifs_assert(lprops->lnum == data.lnum);
+ ubifs_assert(lprops->free + lprops->dirty >= min_space);
+ ubifs_assert(lprops->dirty >= c->dead_wm);
+ ubifs_assert(!(lprops->flags & LPROPS_TAKEN));
+ ubifs_assert(!(lprops->flags & LPROPS_INDEX));
+ return lprops;
+}
+
+/**
+ * ubifs_find_dirty_leb - find a dirty LEB for the Garbage Collector.
+ * @c: the UBIFS file-system description object
+ * @ret_lp: LEB properties are returned here on exit
+ * @min_space: minimum amount free plus dirty space the returned LEB has to
+ * have
+ * @pick_free: if it is ok to return a free or freeable LEB;
+ *
+ * This function tries to find a dirty logical eraseblock which has at least
+ * @min_space free and dirty space. It prefers to take an LEB from the dirty or
+ * dirty index heap, and it falls-back to LPT scanning if the heaps are empty
+ * or do not have an LEB which satisfies the @min_space criteria.
+ *
+ * Note:
+ * o LEBs which have less then dead watermark of dirty space are never picked
+ * by this function;
+ *
+ * Returns zero and the LEB properties of
+ * found dirty LEB in case of success, %-ENOSPC if no dirty LEB was found and a
+ * negative error code in case of other failures. The returned LEB is marked as
+ * "taken".
+ *
+ * The additional @pick_free argument controls if this function has to return a
+ * free or freeable LEB if one is present. E.g., is convenient for GC to set it
+ * to %1, when GC is called from the journal space reservation function - it
+ * has to produce find an LEB as soon as possible. Which means, if a free or
+ * freeable LEB come in the middle of garbage collection, it has to be erased
+ * and used.
+ *
+ * In opposite, if the Garbage Collector is called from the budgeting, it
+ * should just make free space, not retuning LEBs which are already free (or
+ * freeable, which is basically the same, but freeable will become available
+ * only after the commit).
+ */
+int ubifs_find_dirty_leb(struct ubifs_info *c, struct ubifs_lprops *ret_lp,
+ int min_space, int pick_free)
+{
+ int err = 0, sum, exclude_index = 0;
+ const struct ubifs_lprops *lp = NULL, *idx_lp = NULL;
+ struct ubifs_lpt_heap *heap, *idx_heap;
+
+ ubifs_get_lprops(c);
+
+ if (pick_free) {
+ int lebs, rsvd_idx_lebs = 0;
+
+ spin_lock(&c->space_lock);
+ lebs = c->lst.empty_lebs;
+ lebs += c->freeable_cnt - c->lst.taken_empty_lebs;
+
+ /*
+ * Note, the index may consume more LEBs than it has been
+ * reserved for it. It is OK because it might be consolidated
+ * by this "in-the-gaps" index commit method if needed. But if
+ * the index takes fewer LEBs than it is reserved for it, this
+ * function should anyway avoid picking those reserved LEBs.
+ */
+ if (c->min_idx_lebs >= c->lst.idx_lebs) {
+ rsvd_idx_lebs = c->min_idx_lebs - c->lst.idx_lebs;
+ exclude_index = 1;
+ }
+ spin_unlock(&c->space_lock);
+
+ /* Check if there are enough free LEBs for the index */
+ if (rsvd_idx_lebs < lebs) {
+ /* OK, try to find an empty LEB */
+ lp = ubifs_fast_find_empty(c);
+ if (lp)
+ goto found;
+
+ /* Or a freeable LEB */
+ lp = ubifs_fast_find_freeable(c);
+ if (lp)
+ goto found;
+ } else
+ /*
+ * We cannot pick free/freeable LEBs in the below code.
+ */
+ pick_free = 0;
+ } else {
+ spin_lock(&c->space_lock);
+ exclude_index = (c->min_idx_lebs >= c->lst.idx_lebs);
+ spin_unlock(&c->space_lock);
+ }
+
+ /* Look on the dirty and dirty index heaps */
+ heap = &c->lpt_heap[LPROPS_DIRTY - 1];
+ idx_heap = &c->lpt_heap[LPROPS_DIRTY_IDX - 1];
+
+ if (idx_heap->cnt && !exclude_index) {
+ idx_lp = idx_heap->arr[0];
+ sum = idx_lp->free + idx_lp->dirty;
+ /*
+ * Since we reserve twice as more space for the index than it
+ * actually takes, it does not make sense to pick indexing LEBs
+ * with less then half LEB of dirty space.
+ */
+ if (sum < min_space || sum < c->half_leb_size)
+ idx_lp = NULL;
+ }
+
+ if (heap->cnt) {
+ lp = heap->arr[0];
+ if (lp->dirty + lp->free < min_space)
+ lp = NULL;
+ }
+
+ /* Pick the LEB with most space */
+ if (idx_lp && lp) {
+ if (idx_lp->free + idx_lp->dirty >= lp->free + lp->dirty)
+ lp = idx_lp;
+ } else if (idx_lp && !lp)
+ lp = idx_lp;
+
+ if (lp) {
+ ubifs_assert(lp->dirty >= c->dead_wm);
+ goto found;
+ }
+
+ /* Did not find a dirty LEB on the dirty heaps, have to scan */
+ dbg_find("scanning LPT for a dirty LEB");
+ lp = scan_for_dirty(c, min_space, pick_free, exclude_index);
+ if (IS_ERR(lp)) {
+ err = PTR_ERR(lp);
+ goto out;
+ }
+ ubifs_assert(lp->dirty >= c->dead_wm);
+
+found:
+ dbg_find("found LEB %d, free %d, dirty %d, flags %#x",
+ lp->lnum, lp->free, lp->dirty, lp->flags);
+
+ lp = ubifs_change_lp(c, lp, -1, -1, lp->flags | LPROPS_TAKEN, 0);
+ if (IS_ERR(lp)) {
+ err = PTR_ERR(lp);
+ goto out;
+ }
+
+ memcpy(ret_lp, lp, sizeof(struct ubifs_lprops));
+
+out:
+ ubifs_release_lprops(c);
+ return err;
+}
+
+/**
+ * scan_for_free_cb - free space scan callback.
+ * @c: the UBIFS file-system description object
+ * @lprops: LEB properties to scan
+ * @in_tree: whether the LEB properties are in main memory
+ * @data: information passed to and from the caller of the scan
+ *
+ * This function returns a code that indicates whether the scan should continue
+ * (%LPT_SCAN_CONTINUE), whether the LEB properties should be added to the tree
+ * in main memory (%LPT_SCAN_ADD), or whether the scan should stop
+ * (%LPT_SCAN_STOP).
+ */
+static int scan_for_free_cb(struct ubifs_info *c,
+ const struct ubifs_lprops *lprops, int in_tree,
+ struct scan_data *data)
+{
+ int ret = LPT_SCAN_CONTINUE;
+
+ /* Exclude LEBs that are currently in use */
+ if (lprops->flags & LPROPS_TAKEN)
+ return LPT_SCAN_CONTINUE;
+ /* Determine whether to add these LEB properties to the tree */
+ if (!in_tree && valuable(c, lprops))
+ ret |= LPT_SCAN_ADD;
+ /* Exclude index LEBs */
+ if (lprops->flags & LPROPS_INDEX)
+ return ret;
+ /* Exclude LEBs with too little space */
+ if (lprops->free < data->min_space)
+ return ret;
+ /* If specified, exclude empty LEBs */
+ if (!data->pick_free && lprops->free == c->leb_size)
+ return ret;
+ /*
+ * LEBs that have only free and dirty space must not be allocated
+ * because they may have been unmapped already or they may have data
+ * that is obsolete only because of nodes that are still sitting in a
+ * wbuf.
+ */
+ if (lprops->free + lprops->dirty == c->leb_size && lprops->dirty > 0)
+ return ret;
+ /* Finally we found space */
+ data->lnum = lprops->lnum;
+ return LPT_SCAN_ADD | LPT_SCAN_STOP;
+}
+
+/**
+ * do_find_free_space - find a data LEB with free space.
+ * @c: the UBIFS file-system description object
+ * @min_space: minimum amount of free space required
+ * @pick_free: whether it is OK to scan for empty LEBs
+ * @squeeze: whether to try to find space in a non-empty LEB first
+ *
+ * This function returns a pointer to the LEB properties found or a negative
+ * error code.
+ */
+static
+const struct ubifs_lprops *do_find_free_space(struct ubifs_info *c,
+ int min_space, int pick_free,
+ int squeeze)
+{
+ const struct ubifs_lprops *lprops;
+ struct ubifs_lpt_heap *heap;
+ struct scan_data data;
+ int err, i;
+
+ if (squeeze) {
+ lprops = ubifs_fast_find_free(c);
+ if (lprops && lprops->free >= min_space)
+ return lprops;
+ }
+ if (pick_free) {
+ lprops = ubifs_fast_find_empty(c);
+ if (lprops)
+ return lprops;
+ }
+ if (!squeeze) {
+ lprops = ubifs_fast_find_free(c);
+ if (lprops && lprops->free >= min_space)
+ return lprops;
+ }
+ /* There may be an LEB with enough free space on the dirty heap */
+ heap = &c->lpt_heap[LPROPS_DIRTY - 1];
+ for (i = 0; i < heap->cnt; i++) {
+ lprops = heap->arr[i];
+ if (lprops->free >= min_space)
+ return lprops;
+ }
+ /*
+ * A LEB may have fallen off of the bottom of the free heap, and ended
+ * up as uncategorized even though it has enough free space for us now,
+ * so check the uncategorized list. N.B. neither empty nor freeable LEBs
+ * can end up as uncategorized because they are kept on lists not
+ * finite-sized heaps.
+ */
+ list_for_each_entry(lprops, &c->uncat_list, list) {
+ if (lprops->flags & LPROPS_TAKEN)
+ continue;
+ if (lprops->flags & LPROPS_INDEX)
+ continue;
+ if (lprops->free >= min_space)
+ return lprops;
+ }
+ /* We have looked everywhere in main memory, now scan the flash */
+ if (c->pnodes_have >= c->pnode_cnt)
+ /* All pnodes are in memory, so skip scan */
+ return ERR_PTR(-ENOSPC);
+ data.min_space = min_space;
+ data.pick_free = pick_free;
+ data.lnum = -1;
+ err = ubifs_lpt_scan_nolock(c, -1, c->lscan_lnum,
+ (ubifs_lpt_scan_callback)scan_for_free_cb,
+ &data);
+ if (err)
+ return ERR_PTR(err);
+ ubifs_assert(data.lnum >= c->main_first && data.lnum < c->leb_cnt);
+ c->lscan_lnum = data.lnum;
+ lprops = ubifs_lpt_lookup_dirty(c, data.lnum);
+ if (IS_ERR(lprops))
+ return lprops;
+ ubifs_assert(lprops->lnum == data.lnum);
+ ubifs_assert(lprops->free >= min_space);
+ ubifs_assert(!(lprops->flags & LPROPS_TAKEN));
+ ubifs_assert(!(lprops->flags & LPROPS_INDEX));
+ return lprops;
+}
+
+/**
+ * ubifs_find_free_space - find a data LEB with free space.
+ * @c: the UBIFS file-system description object
+ * @min_space: minimum amount of required free space
+ * @free: contains amount of free space in the LEB on exit
+ * @squeeze: whether to try to find space in a non-empty LEB first
+ *
+ * This function looks for an LEB with at least @min_space bytes of free space.
+ * It tries to find an empty LEB if possible. If no empty LEBs are available,
+ * this function searches for a non-empty data LEB. The returned LEB is marked
+ * as "taken".
+ *
+ * This function returns found LEB number in case of success, %-ENOSPC if it
+ * failed to find a LEB with @min_space bytes of free space and other a negative
+ * error codes in case of failure.
+ */
+int ubifs_find_free_space(struct ubifs_info *c, int min_space, int *free,
+ int squeeze)
+{
+ const struct ubifs_lprops *lprops;
+ int lebs, rsvd_idx_lebs, pick_free = 0, err, lnum, flags;
+
+ dbg_find("min_space %d", min_space);
+ ubifs_assert(min_space > 0 && min_space <= c->dark_wm);
+
+ ubifs_get_lprops(c);
+
+ /* Check if there are enough empty LEBs for commit */
+ spin_lock(&c->space_lock);
+ if (c->min_idx_lebs > c->lst.idx_lebs)
+ rsvd_idx_lebs = c->min_idx_lebs - c->lst.idx_lebs;
+ else
+ rsvd_idx_lebs = 0;
+ lebs = c->lst.empty_lebs + c->freeable_cnt + c->idx_gc_cnt -
+ c->lst.taken_empty_lebs;
+ if (rsvd_idx_lebs < lebs)
+ /*
+ * OK to allocate an empty LEB, but we still don't want to go
+ * looking for one if there aren't any.
+ */
+ if (c->lst.empty_lebs - c->lst.taken_empty_lebs > 0) {
+ pick_free = 1;
+ /*
+ * Because we release the space lock, we must account
+ * for this allocation here. After the LEB properties
+ * flags have been updated, we subtract one.
+ */
+ c->lst.taken_empty_lebs += 1;
+ }
+ spin_unlock(&c->space_lock);
+
+ lprops = do_find_free_space(c, min_space, pick_free, squeeze);
+ if (IS_ERR(lprops)) {
+ err = PTR_ERR(lprops);
+ goto out;
+ }
+
+ lnum = lprops->lnum;
+ flags = lprops->flags | LPROPS_TAKEN;
+
+ lprops = ubifs_change_lp(c, lprops, -1, -1, flags, 0);
+ if (IS_ERR(lprops)) {
+ err = PTR_ERR(lprops);
+ goto out;
+ }
+
+ if (pick_free) {
+ spin_lock(&c->space_lock);
+ c->lst.taken_empty_lebs -= 1;
+ spin_unlock(&c->space_lock);
+ }
+
+ *free = lprops->free;
+ ubifs_release_lprops(c);
+
+ if (*free == c->leb_size) {
+ /*
+ * Ensure that empty LEBs have been unmapped. They may not have
+ * been, for example, because of an unclean unmount. Also
+ * LEBs that were freeable LEBs (free + dirty == leb_size) will
+ * not have been unmapped.
+ */
+ err = ubifs_leb_unmap(c, lnum);
+ if (err)
+ return err;
+ }
+
+ dbg_find("found LEB %d, free %d", lnum, *free);
+ ubifs_assert(*free >= min_space);
+ return lnum;
+
+out:
+ if (pick_free) {
+ spin_lock(&c->space_lock);
+ c->lst.taken_empty_lebs -= 1;
+ spin_unlock(&c->space_lock);
+ }
+ ubifs_release_lprops(c);
+ return err;
+}
+
+/**
+ * scan_for_idx_cb - callback used by the scan for a free LEB for the index.
+ * @c: the UBIFS file-system description object
+ * @lprops: LEB properties to scan
+ * @in_tree: whether the LEB properties are in main memory
+ * @data: information passed to and from the caller of the scan
+ *
+ * This function returns a code that indicates whether the scan should continue
+ * (%LPT_SCAN_CONTINUE), whether the LEB properties should be added to the tree
+ * in main memory (%LPT_SCAN_ADD), or whether the scan should stop
+ * (%LPT_SCAN_STOP).
+ */
+static int scan_for_idx_cb(struct ubifs_info *c,
+ const struct ubifs_lprops *lprops, int in_tree,
+ struct scan_data *data)
+{
+ int ret = LPT_SCAN_CONTINUE;
+
+ /* Exclude LEBs that are currently in use */
+ if (lprops->flags & LPROPS_TAKEN)
+ return LPT_SCAN_CONTINUE;
+ /* Determine whether to add these LEB properties to the tree */
+ if (!in_tree && valuable(c, lprops))
+ ret |= LPT_SCAN_ADD;
+ /* Exclude index LEBS */
+ if (lprops->flags & LPROPS_INDEX)
+ return ret;
+ /* Exclude LEBs that cannot be made empty */
+ if (lprops->free + lprops->dirty != c->leb_size)
+ return ret;
+ /*
+ * We are allocating for the index so it is safe to allocate LEBs with
+ * only free and dirty space, because write buffers are sync'd at commit
+ * start.
+ */
+ data->lnum = lprops->lnum;
+ return LPT_SCAN_ADD | LPT_SCAN_STOP;
+}
+
+/**
+ * scan_for_leb_for_idx - scan for a free LEB for the index.
+ * @c: the UBIFS file-system description object
+ */
+static const struct ubifs_lprops *scan_for_leb_for_idx(struct ubifs_info *c)
+{
+ struct ubifs_lprops *lprops;
+ struct scan_data data;
+ int err;
+
+ data.lnum = -1;
+ err = ubifs_lpt_scan_nolock(c, -1, c->lscan_lnum,
+ (ubifs_lpt_scan_callback)scan_for_idx_cb,
+ &data);
+ if (err)
+ return ERR_PTR(err);
+ ubifs_assert(data.lnum >= c->main_first && data.lnum < c->leb_cnt);
+ c->lscan_lnum = data.lnum;
+ lprops = ubifs_lpt_lookup_dirty(c, data.lnum);
+ if (IS_ERR(lprops))
+ return lprops;
+ ubifs_assert(lprops->lnum == data.lnum);
+ ubifs_assert(lprops->free + lprops->dirty == c->leb_size);
+ ubifs_assert(!(lprops->flags & LPROPS_TAKEN));
+ ubifs_assert(!(lprops->flags & LPROPS_INDEX));
+ return lprops;
+}
+
+/**
+ * ubifs_find_free_leb_for_idx - find a free LEB for the index.
+ * @c: the UBIFS file-system description object
+ *
+ * This function looks for a free LEB and returns that LEB number. The returned
+ * LEB is marked as "taken", "index".
+ *
+ * If no LEB is found %-ENOSPC is returned. For other failures another negative
+ * error code is returned.
+ */
+int ubifs_find_free_leb_for_idx(struct ubifs_info *c)
+{
+ const struct ubifs_lprops *lprops;
+ int lnum = -1, err, flags;
+
+ ubifs_get_lprops(c);
+
+ lprops = ubifs_fast_find_empty(c);
+ if (!lprops) {
+ lprops = ubifs_fast_find_freeable(c);
+ if (!lprops) {
+ ubifs_assert(c->freeable_cnt == 0);
+ if (c->lst.empty_lebs - c->lst.taken_empty_lebs > 0) {
+ lprops = scan_for_leb_for_idx(c);
+ if (IS_ERR(lprops)) {
+ err = PTR_ERR(lprops);
+ goto out;
+ }
+ }
+ }
+ }
+
+ if (!lprops) {
+ err = -ENOSPC;
+ goto out;
+ }
+
+ lnum = lprops->lnum;
+
+ dbg_find("found LEB %d, free %d, dirty %d, flags %#x",
+ lnum, lprops->free, lprops->dirty, lprops->flags);
+
+ flags = lprops->flags | LPROPS_TAKEN | LPROPS_INDEX;
+ lprops = ubifs_change_lp(c, lprops, c->leb_size, 0, flags, 0);
+ if (IS_ERR(lprops)) {
+ err = PTR_ERR(lprops);
+ goto out;
+ }
+
+ ubifs_release_lprops(c);
+
+ /*
+ * Ensure that empty LEBs have been unmapped. They may not have been,
+ * for example, because of an unclean unmount. Also LEBs that were
+ * freeable LEBs (free + dirty == leb_size) will not have been unmapped.
+ */
+ err = ubifs_leb_unmap(c, lnum);
+ if (err) {
+ ubifs_change_one_lp(c, lnum, -1, -1, 0,
+ LPROPS_TAKEN | LPROPS_INDEX, 0);
+ return err;
+ }
+
+ return lnum;
+
+out:
+ ubifs_release_lprops(c);
+ return err;
+}
+
+static int cmp_dirty_idx(const struct ubifs_lprops **a,
+ const struct ubifs_lprops **b)
+{
+ const struct ubifs_lprops *lpa = *a;
+ const struct ubifs_lprops *lpb = *b;
+
+ return lpa->dirty + lpa->free - lpb->dirty - lpb->free;
+}
+
+static void swap_dirty_idx(struct ubifs_lprops **a, struct ubifs_lprops **b,
+ int size)
+{
+ struct ubifs_lprops *t = *a;
+
+ *a = *b;
+ *b = t;
+}
+
+/**
+ * ubifs_save_dirty_idx_lnums - save an array of the most dirty index LEB nos.
+ * @c: the UBIFS file-system description object
+ *
+ * This function is called each commit to create an array of LEB numbers of
+ * dirty index LEBs sorted in order of dirty and free space. This is used by
+ * the in-the-gaps method of TNC commit.
+ */
+int ubifs_save_dirty_idx_lnums(struct ubifs_info *c)
+{
+ int i;
+
+ ubifs_get_lprops(c);
+ /* Copy the LPROPS_DIRTY_IDX heap */
+ c->dirty_idx.cnt = c->lpt_heap[LPROPS_DIRTY_IDX - 1].cnt;
+ memcpy(c->dirty_idx.arr, c->lpt_heap[LPROPS_DIRTY_IDX - 1].arr,
+ sizeof(void *) * c->dirty_idx.cnt);
+ /* Sort it so that the dirtiest is now at the end */
+ sort(c->dirty_idx.arr, c->dirty_idx.cnt, sizeof(void *),
+ (int (*)(const void *, const void *))cmp_dirty_idx,
+ (void (*)(void *, void *, int))swap_dirty_idx);
+ dbg_find("found %d dirty index LEBs", c->dirty_idx.cnt);
+ if (c->dirty_idx.cnt)
+ dbg_find("dirtiest index LEB is %d with dirty %d and free %d",
+ c->dirty_idx.arr[c->dirty_idx.cnt - 1]->lnum,
+ c->dirty_idx.arr[c->dirty_idx.cnt - 1]->dirty,
+ c->dirty_idx.arr[c->dirty_idx.cnt - 1]->free);
+ /* Replace the lprops pointers with LEB numbers */
+ for (i = 0; i < c->dirty_idx.cnt; i++)
+ c->dirty_idx.arr[i] = (void *)(size_t)c->dirty_idx.arr[i]->lnum;
+ ubifs_release_lprops(c);
+ return 0;
+}
+
+/**
+ * scan_dirty_idx_cb - callback used by the scan for a dirty index LEB.
+ * @c: the UBIFS file-system description object
+ * @lprops: LEB properties to scan
+ * @in_tree: whether the LEB properties are in main memory
+ * @data: information passed to and from the caller of the scan
+ *
+ * This function returns a code that indicates whether the scan should continue
+ * (%LPT_SCAN_CONTINUE), whether the LEB properties should be added to the tree
+ * in main memory (%LPT_SCAN_ADD), or whether the scan should stop
+ * (%LPT_SCAN_STOP).
+ */
+static int scan_dirty_idx_cb(struct ubifs_info *c,
+ const struct ubifs_lprops *lprops, int in_tree,
+ struct scan_data *data)
+{
+ int ret = LPT_SCAN_CONTINUE;
+
+ /* Exclude LEBs that are currently in use */
+ if (lprops->flags & LPROPS_TAKEN)
+ return LPT_SCAN_CONTINUE;
+ /* Determine whether to add these LEB properties to the tree */
+ if (!in_tree && valuable(c, lprops))
+ ret |= LPT_SCAN_ADD;
+ /* Exclude non-index LEBs */
+ if (!(lprops->flags & LPROPS_INDEX))
+ return ret;
+ /* Exclude LEBs with too little space */
+ if (lprops->free + lprops->dirty < c->min_idx_node_sz)
+ return ret;
+ /* Finally we found space */
+ data->lnum = lprops->lnum;
+ return LPT_SCAN_ADD | LPT_SCAN_STOP;
+}
+
+/**
+ * find_dirty_idx_leb - find a dirty index LEB.
+ * @c: the UBIFS file-system description object
+ *
+ * This function returns LEB number upon success and a negative error code upon
+ * failure. In particular, -ENOSPC is returned if a dirty index LEB is not
+ * found.
+ *
+ * Note that this function scans the entire LPT but it is called very rarely.
+ */
+static int find_dirty_idx_leb(struct ubifs_info *c)
+{
+ const struct ubifs_lprops *lprops;
+ struct ubifs_lpt_heap *heap;
+ struct scan_data data;
+ int err, i, ret;
+
+ /* Check all structures in memory first */
+ data.lnum = -1;
+ heap = &c->lpt_heap[LPROPS_DIRTY_IDX - 1];
+ for (i = 0; i < heap->cnt; i++) {
+ lprops = heap->arr[i];
+ ret = scan_dirty_idx_cb(c, lprops, 1, &data);
+ if (ret & LPT_SCAN_STOP)
+ goto found;
+ }
+ list_for_each_entry(lprops, &c->frdi_idx_list, list) {
+ ret = scan_dirty_idx_cb(c, lprops, 1, &data);
+ if (ret & LPT_SCAN_STOP)
+ goto found;
+ }
+ list_for_each_entry(lprops, &c->uncat_list, list) {
+ ret = scan_dirty_idx_cb(c, lprops, 1, &data);
+ if (ret & LPT_SCAN_STOP)
+ goto found;
+ }
+ if (c->pnodes_have >= c->pnode_cnt)
+ /* All pnodes are in memory, so skip scan */
+ return -ENOSPC;
+ err = ubifs_lpt_scan_nolock(c, -1, c->lscan_lnum,
+ (ubifs_lpt_scan_callback)scan_dirty_idx_cb,
+ &data);
+ if (err)
+ return err;
+found:
+ ubifs_assert(data.lnum >= c->main_first && data.lnum < c->leb_cnt);
+ c->lscan_lnum = data.lnum;
+ lprops = ubifs_lpt_lookup_dirty(c, data.lnum);
+ if (IS_ERR(lprops))
+ return PTR_ERR(lprops);
+ ubifs_assert(lprops->lnum == data.lnum);
+ ubifs_assert(lprops->free + lprops->dirty >= c->min_idx_node_sz);
+ ubifs_assert(!(lprops->flags & LPROPS_TAKEN));
+ ubifs_assert((lprops->flags & LPROPS_INDEX));
+
+ dbg_find("found dirty LEB %d, free %d, dirty %d, flags %#x",
+ lprops->lnum, lprops->free, lprops->dirty, lprops->flags);
+
+ lprops = ubifs_change_lp(c, lprops, -1, -1,
+ lprops->flags | LPROPS_TAKEN, 0);
+ if (IS_ERR(lprops))
+ return PTR_ERR(lprops);
+
+ return lprops->lnum;
+}
+
+/**
+ * get_idx_gc_leb - try to get a LEB number from trivial GC.
+ * @c: the UBIFS file-system description object
+ */
+static int get_idx_gc_leb(struct ubifs_info *c)
+{
+ const struct ubifs_lprops *lp;
+ int err, lnum;
+
+ err = ubifs_get_idx_gc_leb(c);
+ if (err < 0)
+ return err;
+ lnum = err;
+ /*
+ * The LEB was due to be unmapped after the commit but
+ * it is needed now for this commit.
+ */
+ lp = ubifs_lpt_lookup_dirty(c, lnum);
+ if (unlikely(IS_ERR(lp)))
+ return PTR_ERR(lp);
+ lp = ubifs_change_lp(c, lp, -1, -1, lp->flags | LPROPS_INDEX, -1);
+ if (unlikely(IS_ERR(lp)))
+ return PTR_ERR(lp);
+ dbg_find("LEB %d, dirty %d and free %d flags %#x",
+ lp->lnum, lp->dirty, lp->free, lp->flags);
+ return lnum;
+}
+
+/**
+ * find_dirtiest_idx_leb - find dirtiest index LEB from dirtiest array.
+ * @c: the UBIFS file-system description object
+ */
+static int find_dirtiest_idx_leb(struct ubifs_info *c)
+{
+ const struct ubifs_lprops *lp;
+ int lnum;
+
+ while (1) {
+ if (!c->dirty_idx.cnt)
+ return -ENOSPC;
+ /* The lprops pointers were replaced by LEB numbers */
+ lnum = (size_t)c->dirty_idx.arr[--c->dirty_idx.cnt];
+ lp = ubifs_lpt_lookup(c, lnum);
+ if (IS_ERR(lp))
+ return PTR_ERR(lp);
+ if ((lp->flags & LPROPS_TAKEN) || !(lp->flags & LPROPS_INDEX))
+ continue;
+ lp = ubifs_change_lp(c, lp, -1, -1,
+ lp->flags | LPROPS_TAKEN, 0);
+ if (IS_ERR(lp))
+ return PTR_ERR(lp);
+ break;
+ }
+ dbg_find("LEB %d, dirty %d and free %d flags %#x", lp->lnum, lp->dirty,
+ lp->free, lp->flags);
+ ubifs_assert(lp->flags | LPROPS_TAKEN);
+ ubifs_assert(lp->flags | LPROPS_INDEX);
+ return lnum;
+}
+
+/**
+ * ubifs_find_dirty_idx_leb - try to find dirtiest index LEB as at last commit.
+ * @c: the UBIFS file-system description object
+ *
+ * This function attempts to find an untaken index LEB with the most free and
+ * dirty space that can be used without overwriting index nodes that were in the
+ * last index committed.
+ */
+int ubifs_find_dirty_idx_leb(struct ubifs_info *c)
+{
+ int err;
+
+ ubifs_get_lprops(c);
+
+ /*
+ * We made an array of the dirtiest index LEB numbers as at the start of
+ * last commit. Try that array first.
+ */
+ err = find_dirtiest_idx_leb(c);
+
+ /* Next try scanning the entire LPT */
+ if (err == -ENOSPC)
+ err = find_dirty_idx_leb(c);
+
+ /* Finally take any index LEBs awaiting trivial GC */
+ if (err == -ENOSPC)
+ err = get_idx_gc_leb(c);
+
+ ubifs_release_lprops(c);
+ return err;
+}
--
1.5.4.1

2008-03-27 13:15:50

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 03/26] UBIFS: add flash scanning

This is a small sub-system which is doing eraseblock scanning. For
example, this is needed during journal replay, recovery, or garbage
collection.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/scan.c | 368 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 368 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/scan.c b/fs/ubifs/scan.c
new file mode 100644
index 0000000..858aa94
--- /dev/null
+++ b/fs/ubifs/scan.c
@@ -0,0 +1,368 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file implements the scan which is a general-purpose function for
+ * determining what nodes are in an eraseblock. The scan is used to replay the
+ * journal, to do garbage collection. for the TNC in-the-gaps method, and by
+ * debugging functions.
+ */
+
+#include "ubifs.h"
+
+/**
+ * scan_padding_bytes - scan for padding bytes.
+ * @buf: buffer to scan
+ * @len: length of buffer
+ *
+ * This function returns the number of padding bytes on success and
+ * %SCANNED_GARBAGE on failure.
+ */
+static int scan_padding_bytes(void *buf, int len)
+{
+ int pad_len = 0, max_pad_len = min_t(int, UBIFS_PAD_NODE_SZ, len);
+ uint8_t *p = buf;
+
+ dbg_scan("not a node");
+
+ while (pad_len < max_pad_len && *p++ == UBIFS_PADDING_BYTE)
+ pad_len += 1;
+
+ if (!pad_len || (pad_len & 7))
+ return SCANNED_GARBAGE;
+
+ dbg_scan("%d padding bytes", pad_len);
+
+ return pad_len;
+}
+
+/**
+ * ubifs_scan_a_node - scan for a node or padding.
+ * @c: UBIFS file-system description object
+ * @buf: buffer to scan
+ * @len: length of buffer
+ * @lnum: logical eraseblock number
+ * @offs: offset within the logical eraseblock
+ * @quiet: print no messages
+ *
+ * This function returns a scanning code to indicate what was scanned.
+ */
+int ubifs_scan_a_node(const struct ubifs_info *c, void *buf, int len, int lnum,
+ int offs, int quiet)
+{
+ struct ubifs_ch *ch = buf;
+ uint32_t magic;
+
+ ubifs_assert(len >= 4);
+
+ magic = le32_to_cpu(ch->magic);
+
+ if (magic == 0xFFFFFFFF) {
+ dbg_scan("hit empty space");
+ return SCANNED_EMPTY_SPACE;
+ }
+
+ if (magic != UBIFS_NODE_MAGIC)
+ return scan_padding_bytes(buf, len);
+
+ if (len < UBIFS_CH_SZ)
+ return SCANNED_GARBAGE;
+
+ dbg_scan("scanning %s", dbg_ntype(ch->node_type));
+
+ if (ubifs_check_node(c, buf, lnum, offs, quiet))
+ return SCANNED_A_CORRUPT_NODE;
+
+ if (ch->node_type == UBIFS_PAD_NODE) {
+ struct ubifs_pad_node *pad = buf;
+ int pad_len = le32_to_cpu(pad->pad_len);
+ int node_len = le32_to_cpu(ch->len);
+
+ /* Validate the padding node */
+ if (pad_len < 0 ||
+ offs + node_len + pad_len > c->leb_size) {
+ if (!quiet) {
+ ubifs_err("bad pad node at LEB %d:%d",
+ lnum, offs);
+ dbg_dump_node(c, pad);
+ }
+ return SCANNED_A_BAD_PAD_NODE;
+ }
+
+ /* Make the node pads to 8-byte boundary */
+ if ((node_len + pad_len) & 7) {
+ if (!quiet) {
+ dbg_err("bad padding length %d - %d",
+ offs, offs + node_len + pad_len);
+ }
+ return SCANNED_A_BAD_PAD_NODE;
+ }
+
+ dbg_scan("%d bytes padded, offset now %d",
+ pad_len, ALIGN(offs + node_len + pad_len, 8));
+
+ return node_len + pad_len;
+ }
+
+ return SCANNED_A_NODE;
+}
+
+/**
+ * ubifs_start_scan - create LEB scanning information at start of scan.
+ * @c: UBIFS file-system description object
+ * @lnum: logical eraseblock number
+ * @offs: offset to start at (usually zero)
+ * @sbuf: scan buffer (must be c->leb_size)
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+struct ubifs_scan_leb *ubifs_start_scan(const struct ubifs_info *c, int lnum,
+ int offs, void *sbuf)
+{
+ struct ubifs_scan_leb *sleb;
+ int err;
+
+ ubifs_assert(lnum >= 0 && lnum < c->leb_cnt);
+ ubifs_assert((offs & 7) == 0);
+ ubifs_assert(offs % c->min_io_size == 0);
+
+ dbg_scan("scan LEB %d:%d", lnum, offs);
+
+ sleb = kzalloc(sizeof(struct ubifs_scan_leb), GFP_NOFS);
+ if (!sleb)
+ return ERR_PTR(-ENOMEM);
+
+ sleb->lnum = lnum;
+ INIT_LIST_HEAD(&sleb->nodes);
+ sleb->buf = sbuf;
+
+ err = ubi_read(c->ubi, lnum, sbuf + offs, offs, c->leb_size - offs);
+ if (err && err != -EBADMSG) {
+ ubifs_err("cannot read %d bytes from LEB %d:%d,"
+ " error %d", c->leb_size - offs, lnum, offs, err);
+ kfree(sleb);
+ return ERR_PTR(err);
+ }
+
+ if (err == -EBADMSG)
+ sleb->ecc = 1;
+
+ return sleb;
+}
+
+/**
+ * ubifs_end_scan - update LEB scanning information at end of scan.
+ * @c: UBIFS file-system description object
+ * @sleb: scanning information
+ * @lnum: logical eraseblock number
+ * @offs: offset to start at (usually zero)
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+void ubifs_end_scan(const struct ubifs_info *c, struct ubifs_scan_leb *sleb,
+ int lnum, int offs)
+{
+ lnum = lnum;
+ dbg_scan("stop scanning LEB %d at offset %d", lnum, offs);
+ ubifs_assert(offs % c->min_io_size == 0);
+
+ sleb->endpt = ALIGN(offs, c->min_io_size);
+}
+
+/**
+ * ubifs_add_snod - add a scanned node to LEB scanning information.
+ * @c: UBIFS file-system description object
+ * @sleb: scanning information
+ * @buf: buffer containing node
+ * @offs: offset of node on flash
+ *
+ * This function returns %0 on success and a negative error code on failure.
+ */
+int ubifs_add_snod(const struct ubifs_info *c, struct ubifs_scan_leb *sleb,
+ void *buf, int offs)
+{
+ struct ubifs_ch *ch = buf;
+ struct ubifs_ino_node *ino = buf;
+ struct ubifs_scan_node *snod;
+
+ snod = kzalloc(sizeof(struct ubifs_scan_node), GFP_NOFS);
+ if (!snod)
+ return -ENOMEM;
+
+ snod->sqnum = le64_to_cpu(ch->sqnum);
+ snod->type = ch->node_type;
+ snod->offs = offs;
+ snod->len = le32_to_cpu(ch->len);
+ snod->node = buf;
+
+ switch (ch->node_type) {
+ case UBIFS_INO_NODE:
+ case UBIFS_DENT_NODE:
+ case UBIFS_XENT_NODE:
+ case UBIFS_DATA_NODE:
+ case UBIFS_TRUN_NODE:
+ /*
+ * The key is in the same place in all keyed
+ * nodes.
+ */
+ key_read(c, &ino->key, &snod->key);
+ break;
+ }
+ list_add_tail(&snod->list, &sleb->nodes);
+ sleb->nodes_cnt += 1;
+ return 0;
+}
+
+/**
+ * ubifs_scanned_corruption - print information after UBIFS scanned corruption.
+ * @c: UBIFS file-system description object
+ * @lnum: LEB number of corruption
+ * @offs: offset of corruption
+ * @buf: buffer containing corruption
+ */
+void ubifs_scanned_corruption(const struct ubifs_info *c, int lnum, int offs,
+ void *buf)
+{
+ int len;
+
+ ubifs_err("corrupted data at LEB %d:%d", lnum, offs);
+ if (dbg_failure_mode)
+ return;
+ len = c->leb_size - offs;
+ if (len > 4096)
+ len = 4096;
+ dbg_err("first %d bytes from LEB %d:%d", len, lnum, offs);
+ print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_OFFSET, 32, 4, buf, len, 1);
+}
+
+/**
+ * ubifs_scan - scan a logical eraseblock.
+ * @c: UBIFS file-system description object
+ * @lnum: logical eraseblock number
+ * @offs: offset to start at (usually zero)
+ * @sbuf: scan buffer (must be c->leb_size)
+ *
+ * This function scans LEB number @lnum and returns complete information about
+ * its contents. Returns an error code in case of failure.
+ */
+struct ubifs_scan_leb *ubifs_scan(const struct ubifs_info *c, int lnum,
+ int offs, void *sbuf)
+{
+ void *buf = sbuf + offs;
+ int err, len = c->leb_size - offs;
+ struct ubifs_scan_leb *sleb;
+
+ sleb = ubifs_start_scan(c, lnum, offs, sbuf);
+ if (IS_ERR(sleb))
+ return sleb;
+
+ while (len >= 8) {
+ struct ubifs_ch *ch = buf;
+ int node_len, ret;
+
+ dbg_scan("look at LEB %d:%d (%d bytes left)",
+ lnum, offs, len);
+
+ cond_resched();
+
+ ret = ubifs_scan_a_node(c, buf, len, lnum, offs, 0);
+
+ if (ret > 0) {
+ /* Padding bytes or a valid padding node */
+ offs += ret;
+ buf += ret;
+ len -= ret;
+ continue;
+ }
+
+ if (ret == SCANNED_EMPTY_SPACE)
+ /* Empty space is checked later */
+ break;
+
+ switch (ret) {
+ case SCANNED_GARBAGE:
+ dbg_err("garbage");
+ goto corrupted;
+ case SCANNED_A_NODE:
+ break;
+ case SCANNED_A_CORRUPT_NODE:
+ case SCANNED_A_BAD_PAD_NODE:
+ dbg_err("bad node");
+ goto corrupted;
+ default:
+ dbg_err("unknown");
+ goto corrupted;
+ }
+
+ err = ubifs_add_snod(c, sleb, buf, offs);
+ if (err)
+ goto error;
+
+ node_len = ALIGN(le32_to_cpu(ch->len), 8);
+ offs += node_len;
+ buf += node_len;
+ len -= node_len;
+ }
+
+ if (offs % c->min_io_size)
+ goto corrupted;
+
+ ubifs_end_scan(c, sleb, lnum, offs);
+
+ for (; len > 4; offs += 4, buf = buf + 4, len -= 4)
+ if (*(uint32_t *)buf != 0xffffffff)
+ break;
+ for (; len; offs++, buf++, len--)
+ if (*(uint8_t *)buf != 0xff) {
+ ubifs_err("corrupt empty space at LEB %d:%d",
+ lnum, offs);
+ goto corrupted;
+ }
+
+ return sleb;
+
+corrupted:
+ ubifs_scanned_corruption(c, lnum, offs, buf);
+ err = -EUCLEAN;
+error:
+ ubifs_err("LEB %d scanning failed", lnum);
+ ubifs_scan_destroy(sleb);
+ return ERR_PTR(err);
+}
+
+/**
+ * ubifs_scan_destroy - destroy LEB scanning information.
+ * @sleb: scanning information to free
+ */
+void ubifs_scan_destroy(struct ubifs_scan_leb *sleb)
+{
+ struct ubifs_scan_node *node;
+ struct list_head *head;
+
+ head = &sleb->nodes;
+ while (!list_empty(head)) {
+ node = list_entry(head->next, struct ubifs_scan_node, list);
+ list_del(&node->list);
+ kfree(node);
+ }
+ kfree(sleb);
+}
--
1.5.4.1

2008-03-27 13:16:18

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 04/26] UBIFS: add journal replay

The journal re-play subsystem is responsible for replaying the
journal during mount if it was not committed before last un-mount.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/replay.c | 1006 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 1006 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c
new file mode 100644
index 0000000..f627053
--- /dev/null
+++ b/fs/ubifs/replay.c
@@ -0,0 +1,1006 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Adrian Hunter
+ * Artem Bityutskiy (Битюцкий Артём)
+ */
+
+/*
+ * This file contains journal replay code. It runs when the file-system is being
+ * mounted and requires no locking.
+ *
+ * The larger is the journal, the longer it takes to scan it, so the longer it
+ * takes to mount UBIFS. This is why the journal has limited size which may be
+ * changed depending on the system requirements. But a larger journal gives
+ * faster I/O speed because it writes the index less frequently. So this is a
+ * trade-off. Also, the journal is indexed by the in-memory index (TNC), so the
+ * larger is the journal, the more memory its index may consume.
+ */
+
+#include "ubifs.h"
+
+/*
+ * Replay flags.
+ *
+ * REPLAY_DELETION: node was deleted
+ * REPLAY_REF: node is a reference node
+ */
+enum {
+ REPLAY_DELETION = 1,
+ REPLAY_REF = 2,
+};
+
+/**
+ * struct replay_entry - replay tree entry.
+ * @lnum: logical eraseblock number of the node
+ * @offs: node offset
+ * @len: node length
+ * @sqnum: node sequence number
+ * @flags: replay flags
+ * @rb: links the replay tree
+ * @key: node key
+ * @nm: directory entry name
+ * @old_size: truncation old size
+ * @new_size: truncation new size
+ * @free: amount of free space in a bud
+ * @dirty: amount of dirty space in a bud from padding and deletion nodes
+ *
+ * UBIFS journal replay must compare node sequence numbers, which means it must
+ * build a tree of node information to insert into the TNC.
+ */
+struct replay_entry {
+ int lnum;
+ int offs;
+ int len;
+ unsigned long long sqnum;
+ int flags;
+ struct rb_node rb;
+ union ubifs_key key;
+ union {
+ struct qstr nm;
+ struct {
+ loff_t old_size;
+ loff_t new_size;
+ };
+ struct {
+ int free;
+ int dirty;
+ };
+ };
+};
+
+/**
+ * struct bud_entry - entry in the list of buds to replay.
+ * @list: next bud in the list
+ * @bud: bud description object
+ * @free: free bytes in the bud
+ * @sqnum: reference node sequence number
+ */
+struct bud_entry {
+ struct list_head list;
+ struct ubifs_bud *bud;
+ int free;
+ unsigned long long sqnum;
+};
+
+/**
+ * set_bud_lprops - set free and dirty space used by a bud.
+ * @c: UBIFS file-system description object
+ * @r: replay entry of bud
+ */
+static int set_bud_lprops(struct ubifs_info *c, struct replay_entry *r)
+{
+ const struct ubifs_lprops *lp;
+ int err = 0, dirty;
+
+ ubifs_get_lprops(c);
+
+ lp = ubifs_lpt_lookup_dirty(c, r->lnum);
+ if (IS_ERR(lp)) {
+ err = PTR_ERR(lp);
+ goto out;
+ }
+
+ dirty = lp->dirty;
+ if (r->offs == 0 && (lp->free != c->leb_size || lp->dirty != 0)) {
+ dbg_mnt("bud LEB %d was GC'd (%d free, %d dirty)", r->lnum,
+ lp->free, lp->dirty);
+ dbg_gc("bud LEB %d was GC'd (%d free, %d dirty)", r->lnum,
+ lp->free, lp->dirty);
+ dirty -= c->leb_size - lp->free;
+ if (dirty != 0)
+ dbg_msg("LEB %d lp: %d free %d dirty "
+ "replay: %d free %d dirty", r->lnum, lp->free,
+ lp->dirty, r->free, r->dirty);
+ }
+ lp = ubifs_change_lp(c, lp, r->free, dirty + r->dirty,
+ lp->flags | LPROPS_TAKEN, 0);
+ if (IS_ERR(lp)) {
+ err = PTR_ERR(lp);
+ goto out;
+ }
+out:
+ ubifs_release_lprops(c);
+ return err;
+}
+
+/**
+ * trun_remove_range - apply a replay entry for a truncation to the TNC.
+ * @c: UBIFS file-system description object
+ * @r: replay entry of truncation
+ */
+static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r)
+{
+ unsigned min_blk, max_blk;
+ union ubifs_key min_key, max_key;
+ ino_t ino;
+
+ min_blk = r->new_size / UBIFS_BLOCK_SIZE;
+ if (r->new_size & (UBIFS_BLOCK_SIZE - 1))
+ min_blk += 1;
+
+ max_blk = r->old_size / UBIFS_BLOCK_SIZE;
+ if ((r->old_size & (UBIFS_BLOCK_SIZE - 1)) == 0)
+ max_blk -= 1;
+
+ ino = key_ino(c, &r->key);
+
+ data_key_init(c, &min_key, ino, min_blk);
+ data_key_init(c, &max_key, ino, max_blk);
+
+ return ubifs_tnc_remove_range(c, &min_key, &max_key);
+}
+
+/**
+ * apply_replay_entry - apply a replay entry to the TNC.
+ * @c: UBIFS file-system description object
+ * @r: replay entry to apply
+ *
+ * Apply a replay entry to the TNC.
+ */
+static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r)
+{
+ int err, deletion = ((r->flags & REPLAY_DELETION) != 0);
+
+ dbg_mnt_key(c, &r->key, "LEB %d:%d len %d flgs %d sqnum %llu", r->lnum,
+ r->offs, r->len, r->flags, r->sqnum);
+ if (r->flags & REPLAY_REF)
+ err = set_bud_lprops(c, r);
+ else if (is_hash_key(c, &r->key)) {
+ if (deletion)
+ err = ubifs_tnc_remove_nm(c, &r->key, &r->nm);
+ else
+ err = ubifs_tnc_add_nm(c, &r->key, r->lnum, r->offs,
+ r->len, &r->nm);
+ } else {
+ if (deletion)
+ switch (key_type(c, &r->key)) {
+ case UBIFS_INO_KEY:
+ {
+ ino_t inum = key_ino(c, &r->key);
+
+ err = ubifs_tnc_remove_ino(c, inum);
+ break;
+ }
+ case UBIFS_TRUN_KEY:
+ err = trun_remove_range(c, r);
+ break;
+ default:
+ err = ubifs_tnc_remove(c, &r->key);
+ break;
+ }
+ else
+ err = ubifs_tnc_add(c, &r->key, r->lnum, r->offs,
+ r->len);
+ if (err)
+ return err;
+
+ if (c->need_recovery)
+ err = ubifs_recover_size_accum(c, &r->key, deletion,
+ r->new_size);
+ }
+
+ return err;
+}
+
+/**
+ * destroy_replay_tree - destroy the replay.
+ * @c: UBIFS file-system description object
+ *
+ * Destroy the replay tree.
+ */
+static void destroy_replay_tree(struct ubifs_info *c)
+{
+ struct rb_node *this = c->replay_tree.rb_node;
+ struct replay_entry *r;
+
+ while (this) {
+ if (this->rb_left) {
+ this = this->rb_left;
+ continue;
+ } else if (this->rb_right) {
+ this = this->rb_right;
+ continue;
+ }
+ r = rb_entry(this, struct replay_entry, rb);
+ this = rb_parent(this);
+ if (this) {
+ if (this->rb_left == &r->rb)
+ this->rb_left = NULL;
+ else
+ this->rb_right = NULL;
+ }
+ if (key_type(c, &r->key) == UBIFS_DENT_KEY)
+ kfree(r->nm.name);
+ kfree(r);
+ }
+ c->replay_tree = RB_ROOT;
+}
+
+/**
+ * apply_replay_tree - apply the replay tree to the TNC.
+ * @c: UBIFS file-system description object
+ *
+ * Apply the replay tree.
+ * Returns zero in case of success and a negative error code in case of
+ * failure.
+ */
+static int apply_replay_tree(struct ubifs_info *c)
+{
+ struct rb_node *this = rb_first(&c->replay_tree);
+
+ while (this) {
+ struct replay_entry *r;
+ int err;
+
+ cond_resched();
+
+ r = rb_entry(this, struct replay_entry, rb);
+ err = apply_replay_entry(c, r);
+ if (err)
+ return err;
+ this = rb_next(this);
+ }
+ return 0;
+}
+
+/**
+ * insert_node - insert a node to the replay tree.
+ * @c: UBIFS file-system description object
+ * @lnum: node logical eraseblock number
+ * @offs: node offset
+ * @len: node length
+ * @key: node key
+ * @sqnum: sequence number
+ * @deletion: non-zero if this is a deletion
+ * @used: number of bytes in use in a LEB
+ * @old_size: truncation old size
+ * @new_size: truncation new size
+ *
+ * This function inserts a scanned non-direntry node to the replay tree.
+ * Returns zero in case of success and a negative error code in case of
+ * failure.
+ */
+static int insert_node(struct ubifs_info *c, int lnum, int offs, int len,
+ union ubifs_key *key, unsigned long long sqnum,
+ int deletion, int *used, loff_t old_size,
+ loff_t new_size)
+{
+ struct rb_node **p = &c->replay_tree.rb_node, *parent = NULL;
+ struct replay_entry *r;
+
+ if (key_ino(c, key) >= c->highest_inum)
+ c->highest_inum = key_ino(c, key);
+
+ dbg_mnt_key(c, key, "add LEB %d:%d, key ", lnum, offs);
+ while (*p) {
+ parent = *p;
+ r = rb_entry(parent, struct replay_entry, rb);
+ if (sqnum < r->sqnum) {
+ p = &(*p)->rb_left;
+ continue;
+ } else if (sqnum > r->sqnum) {
+ p = &(*p)->rb_right;
+ continue;
+ }
+ ubifs_err("duplicate sqnum in replay");
+ return -EINVAL;
+ }
+
+ r = kzalloc(sizeof(struct replay_entry), GFP_KERNEL);
+ if (!r)
+ return -ENOMEM;
+
+ if (!deletion)
+ *used += ALIGN(len, 8);
+ r->lnum = lnum;
+ r->offs = offs;
+ r->len = len;
+ r->sqnum = sqnum;
+ r->flags = (deletion ? REPLAY_DELETION : 0);
+ r->old_size = old_size;
+ r->new_size = new_size;
+ key_copy(c, key, &r->key);
+
+ rb_link_node(&r->rb, parent, p);
+ rb_insert_color(&r->rb, &c->replay_tree);
+ return 0;
+}
+
+/**
+ * insert_dent - insert a directory entry node into the replay tree.
+ * @c: UBIFS file-system description object
+ * @lnum: node logical eraseblock number
+ * @offs: node offset
+ * @len: node length
+ * @key: node key
+ * @name: directory entry name
+ * @nlen: directory entry name length
+ * @sqnum: sequence number
+ * @deletion: non-zero if this is a deletion
+ * @used: number of bytes in use in a LEB
+ *
+ * This function inserts a scanned directory entry node to the replay tree.
+ * Returns zero in case of success and a negative error code in case of
+ * failure.
+ *
+ * This function is also used for extended attribute entries because they are
+ * implemented as directory entry nodes.
+ */
+static int insert_dent(struct ubifs_info *c, int lnum, int offs, int len,
+ union ubifs_key *key, const char *name, int nlen,
+ unsigned long long sqnum, int deletion, int *used)
+{
+ struct rb_node **p = &c->replay_tree.rb_node, *parent = NULL;
+ struct replay_entry *r;
+ char *nbuf;
+
+ if (key_ino(c, key) >= c->highest_inum)
+ c->highest_inum = key_ino(c, key);
+
+ dbg_mnt_key(c, key, "add LEB %d:%d, key ", lnum, offs);
+ while (*p) {
+ parent = *p;
+ r = rb_entry(parent, struct replay_entry, rb);
+ if (sqnum < r->sqnum) {
+ p = &(*p)->rb_left;
+ continue;
+ }
+ if (sqnum > r->sqnum) {
+ p = &(*p)->rb_right;
+ continue;
+ }
+ ubifs_err("duplicate sqnum in replay");
+ return -EINVAL;
+ }
+
+ r = kzalloc(sizeof(struct replay_entry), GFP_KERNEL);
+ if (!r)
+ return -ENOMEM;
+ nbuf = kmalloc(nlen + 1, GFP_KERNEL);
+ if (!nbuf) {
+ kfree(r);
+ return -ENOMEM;
+ }
+
+ if (!deletion)
+ *used += ALIGN(len, 8);
+ r->lnum = lnum;
+ r->offs = offs;
+ r->len = len;
+ r->sqnum = sqnum;
+ r->nm.len = nlen;
+ memcpy(nbuf, name, nlen);
+ nbuf[nlen] = '\0';
+ r->nm.name = nbuf;
+ r->flags = (deletion ? REPLAY_DELETION : 0);
+ key_copy(c, key, &r->key);
+
+ ubifs_assert(!*p);
+ rb_link_node(&r->rb, parent, p);
+ rb_insert_color(&r->rb, &c->replay_tree);
+ return 0;
+}
+
+/**
+ * replay_bud - replay a bud logical eraseblock.
+ * @c: UBIFS file-system description object
+ * @lnum: bud logical eraseblock number to replay
+ * @offs: bud start offset
+ * @jhead: journal head to which this bud belongs
+ * @free: amount of free space in the bud is returned here
+ * @dirty: amount of dirty space from padding and deletion nodes is returned
+ * here
+ *
+ * This function returns zero in case of success and a negative error code in
+ * case of failure.
+ */
+static int replay_bud(struct ubifs_info *c, int lnum, int offs, int jhead,
+ int *free, int *dirty)
+{
+ int err = 0, used = 0;
+ struct ubifs_scan_leb *sleb;
+ struct ubifs_scan_node *snod;
+ struct ubifs_bud *bud;
+
+ dbg_mnt("replay bud LEB %d, head %d", lnum, jhead);
+ if (c->need_recovery)
+ sleb = ubifs_recover_leb(c, lnum, offs, c->sbuf, jhead != GCHD);
+ else
+ sleb = ubifs_scan(c, lnum, offs, c->sbuf);
+ if (IS_ERR(sleb))
+ return PTR_ERR(sleb);
+
+ /*
+ * The bud does not have to start from offset zero - the beginning of
+ * the 'lnum' LEB may contain previously committed data. One of the
+ * things we have to do in replay is to correctly update lprops with
+ * newer information about this LEB.
+ *
+ * At this point lprops thinks that this LEB has 'c->leb_size - offs'
+ * bytes of free space because it only contain information about
+ * committed data.
+ *
+ * But we know that real amount of free space is 'c->leb_size -
+ * sleb->endpt', and the space in the 'lnum' LEB between 'offs' and
+ * 'sleb->endpt' is used by bud data. We have to correctly calculate
+ * how much of these data are dirty and update lprops with this
+ * information.
+ *
+ * The dirt in that LEB region is comprised of padding nodes, deletion
+ * nodes, truncation nodes and nodes which are obsoleted by subsequent
+ * nodes in this LEB. So instead of calculating clean space, we
+ * calculate used space ('used' variable).
+ */
+
+ list_for_each_entry(snod, &sleb->nodes, list) {
+ int deletion = 0;
+
+ cond_resched();
+
+ if (snod->sqnum >= SQNUM_WATERMARK) {
+ ubifs_err("file system's life ended");
+ goto out_dump;
+ }
+
+ if (snod->sqnum > c->max_sqnum)
+ c->max_sqnum = snod->sqnum;
+
+ switch (snod->type) {
+ case UBIFS_INO_NODE:
+ {
+ struct ubifs_ino_node *ino = snod->node;
+ loff_t new_size = le64_to_cpu(ino->size);
+
+ if (le32_to_cpu(ino->nlink) == 0)
+ deletion = 1;
+ err = insert_node(c, lnum, snod->offs, snod->len,
+ &snod->key, snod->sqnum, deletion,
+ &used, 0, new_size);
+ break;
+ }
+ case UBIFS_DATA_NODE:
+ {
+ struct ubifs_data_node *dn = snod->node;
+ loff_t new_size = le32_to_cpu(dn->size) +
+ key_block(c, &snod->key) *
+ UBIFS_BLOCK_SIZE;
+
+ err = insert_node(c, lnum, snod->offs, snod->len,
+ &snod->key, snod->sqnum, deletion,
+ &used, 0, new_size);
+ break;
+ }
+ case UBIFS_DENT_NODE:
+ case UBIFS_XENT_NODE:
+ {
+ struct ubifs_dent_node *dent = snod->node;
+
+ err = ubifs_validate_entry(c, dent);
+ if (err)
+ goto out_dump;
+
+ err = insert_dent(c, lnum, snod->offs, snod->len,
+ &snod->key, dent->name,
+ le16_to_cpu(dent->nlen), snod->sqnum,
+ !le64_to_cpu(dent->inum), &used);
+ break;
+ }
+ case UBIFS_TRUN_NODE:
+ {
+ struct ubifs_trun_node *trun = snod->node;
+ loff_t old_size = le64_to_cpu(trun->old_size);
+ loff_t new_size = le64_to_cpu(trun->new_size);
+
+ /* Validate truncation node */
+ if (old_size < 0 || old_size > c->max_inode_sz ||
+ new_size < 0 || new_size > c->max_inode_sz ||
+ old_size <= new_size) {
+ ubifs_err("bad truncation node");
+ goto out_dump;
+ }
+
+ err = insert_node(c, lnum, snod->offs, snod->len,
+ &snod->key, snod->sqnum, 1, &used,
+ old_size, new_size);
+ break;
+ }
+ default:
+ ubifs_err("unexpected node type %d in bud LEB %d:%d",
+ snod->type, lnum, snod->offs);
+ err = -EINVAL;
+ goto out_dump;
+ }
+ if (err)
+ goto out;
+ }
+
+ bud = ubifs_search_bud(c, lnum);
+ if (!bud)
+ BUG();
+
+ ubifs_assert(bud->lnum == lnum);
+ ubifs_assert(bud->start == offs);
+ ubifs_assert(bud->jhead == jhead);
+ ubifs_assert(sleb->endpt - offs >= used);
+ ubifs_assert(sleb->endpt % c->min_io_size == 0);
+
+ if (sleb->endpt + c->min_io_size <= c->leb_size &&
+ !(c->vfs_sb->s_flags & MS_RDONLY))
+ err = ubifs_wbuf_seek_nolock(&c->jheads[jhead].wbuf, lnum,
+ sleb->endpt, UBI_SHORTTERM);
+
+ *dirty = sleb->endpt - offs - used;
+ *free = c->leb_size - sleb->endpt;
+
+out:
+ ubifs_scan_destroy(sleb);
+ return err;
+
+out_dump:
+ ubifs_err("bad node is at LEB %d:%d", lnum, snod->offs);
+ dbg_dump_node(c, snod->node);
+ ubifs_scan_destroy(sleb);
+ return -EINVAL;
+}
+
+/**
+ * insert_ref_node - insert a ref node to the replay tree.
+ * @c: UBIFS file-system description object
+ * @lnum: node logical eraseblock number
+ * @offs: node offset
+ * @sqnum: sequence number
+ * @free: amount of free space in bud
+ * @dirty: amount of dirty space from padding and deletion nodes
+ */
+static int insert_ref_node(struct ubifs_info *c, int lnum, int offs,
+ unsigned long long sqnum, int free, int dirty)
+{
+ struct rb_node **p = &c->replay_tree.rb_node, *parent = NULL;
+ struct replay_entry *r;
+ union ubifs_key key;
+ int cmp;
+
+ dbg_mnt("add ref LEB %d:%d", lnum, offs);
+ highest_ino_key(c, &key, -1);
+ while (*p) {
+ parent = *p;
+ r = rb_entry(parent, struct replay_entry, rb);
+ cmp = keys_cmp(c, &key, &r->key);
+ if (sqnum < r->sqnum) {
+ p = &(*p)->rb_left;
+ continue;
+ } else if (sqnum > r->sqnum) {
+ p = &(*p)->rb_right;
+ continue;
+ }
+ ubifs_err("duplicate sqnum in r");
+ return -EINVAL;
+ }
+
+ r = kzalloc(sizeof(struct replay_entry), GFP_KERNEL);
+ if (!r)
+ return -ENOMEM;
+
+ r->lnum = lnum;
+ r->offs = offs;
+ r->sqnum = sqnum;
+ r->flags = REPLAY_REF;
+ r->free = free;
+ r->dirty = dirty;
+ key_copy(c, &key, &r->key);
+
+ rb_link_node(&r->rb, parent, p);
+ rb_insert_color(&r->rb, &c->replay_tree);
+ return 0;
+}
+
+/**
+ * replay_buds - replay all buds.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns zero in case of success and a negative error code in
+ * case of failure.
+ */
+static int replay_buds(struct ubifs_info *c)
+{
+ struct bud_entry *b;
+ int err, uninitialized_var(free), uninitialized_var(dirty);
+
+ list_for_each_entry(b, &c->replay_buds, list) {
+ err = replay_bud(c, b->bud->lnum, b->bud->start, b->bud->jhead,
+ &free, &dirty);
+ if (err)
+ return err;
+ err = insert_ref_node(c, b->bud->lnum, b->bud->start, b->sqnum,
+ free, dirty);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+/**
+ * destroy_bud_list - destroy the list of buds to replay.
+ * @c: UBIFS file-system description object
+ */
+static void destroy_bud_list(struct ubifs_info *c)
+{
+ struct bud_entry *b;
+
+ while (!list_empty(&c->replay_buds)) {
+ b = list_entry(c->replay_buds.next, struct bud_entry, list);
+ list_del(&b->list);
+ kfree(b);
+ }
+}
+
+/**
+ * add_replay_bud - add a bud to the list of buds to replay.
+ * @c: UBIFS file-system description object
+ * @lnum: bud logical eraseblock number to replay
+ * @offs: bud start offset
+ * @jhead: journal head to which this bud belongs
+ * @sqnum: reference node sequence number
+ *
+ * This function returns zero in case of success and a negative error code in
+ * case of failure.
+ */
+static int add_replay_bud(struct ubifs_info *c, int lnum, int offs, int jhead,
+ unsigned long long sqnum)
+{
+ struct ubifs_bud *bud;
+ struct bud_entry *b;
+
+ dbg_mnt("add replay bud LEB %d:%d, head %d", lnum, offs, jhead);
+
+ bud = kmalloc(sizeof(struct ubifs_bud), GFP_KERNEL);
+ if (!bud)
+ return -ENOMEM;
+
+ b = kmalloc(sizeof(struct bud_entry), GFP_KERNEL);
+ if (!b) {
+ kfree(bud);
+ return -ENOMEM;
+ }
+
+ bud->lnum = lnum;
+ bud->start = offs;
+ bud->jhead = jhead;
+ ubifs_add_bud(c, bud);
+
+ b->bud = bud;
+ b->sqnum = sqnum;
+ list_add_tail(&b->list, &c->replay_buds);
+
+ return 0;
+}
+
+/**
+ * validate_ref - validate a reference node.
+ * @c: UBIFS file-system description object
+ * @ref: the reference node to validate
+ * @ref_lnum: LEB number of the reference node
+ * @ref_offs: reference node offset
+ *
+ * This function returns %1 if a bud reference already exists for the LEB. %0 is
+ * returned if the reference node is new, otherwise %-EINVAL is returned if
+ * validation failed.
+ */
+static int validate_ref(struct ubifs_info *c, const struct ubifs_ref_node *ref)
+{
+ struct ubifs_bud *bud;
+ int lnum = le32_to_cpu(ref->lnum);
+ unsigned int offs = le32_to_cpu(ref->offs);
+ unsigned int jhead = le32_to_cpu(ref->jhead);
+
+ /*
+ * ref->offs may point to the end of LEB when the journal head points
+ * to the end of LEB and we write reference node for it during commit.
+ * So this is why we require 'offs > c->leb_size'.
+ */
+ if (jhead >= c->jhead_cnt || lnum >= c->leb_cnt ||
+ lnum < c->main_first || offs > c->leb_size ||
+ offs & (c->min_io_size - 1))
+ return -EINVAL;
+
+ /* Make sure we have not already looked at this bud */
+ bud = ubifs_search_bud(c, lnum);
+ if (bud) {
+ if (bud->jhead == jhead && bud->start <= offs)
+ return 1;
+ ubifs_err("bud at LEB %d:%d was already referred", lnum, offs);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/**
+ * replay_log_leb - replay a log logical eraseblock.
+ * @c: UBIFS file-system description object
+ * @lnum: log logical eraseblock to replay
+ * @offs: offset to start replaying from
+ * @sbuf: scan buffer
+ *
+ * This function replays a log LEB and returns zero in case of success, %1 if
+ * this is the last LEB in the log, and a negative error code in case of
+ * failure.
+ */
+static int replay_log_leb(struct ubifs_info *c, int lnum, int offs, void *sbuf)
+{
+ int err;
+ struct ubifs_scan_leb *sleb;
+ struct ubifs_scan_node *snod;
+ const struct ubifs_cs_node *node;
+
+ dbg_mnt("replay log LEB %d:%d", lnum, offs);
+ sleb = ubifs_scan(c, lnum, offs, sbuf);
+ if (IS_ERR(sleb)) {
+ if (c->need_recovery)
+ sleb = ubifs_recover_log_leb(c, lnum, offs, sbuf);
+ if (IS_ERR(sleb))
+ return PTR_ERR(sleb);
+ }
+
+ if (sleb->nodes_cnt == 0) {
+ err = 1;
+ goto out;
+ }
+
+ node = sleb->buf;
+
+ snod = list_entry(sleb->nodes.next, struct ubifs_scan_node, list);
+ if (c->cs_sqnum == 0) {
+ /*
+ * This is the first log LEB we are looking at, make sure that
+ * the first node is a commit start node. Also record its
+ * sequence number so that UBIFS can determine where the log
+ * ends, because all nodes which were have higher sequence
+ * numbers.
+ */
+ if (snod->type != UBIFS_CS_NODE) {
+ dbg_err("first log node at LEB %d:%d is not CS node",
+ lnum, offs);
+ goto out_dump;
+ }
+ if (le64_to_cpu(node->cmt_no) != c->cmt_no) {
+ dbg_err("first CS node at LEB %d:%d has wrong "
+ "commit number %llu expected %llu",
+ lnum, offs, le64_to_cpu(node->cmt_no),
+ c->cmt_no);
+ goto out_dump;
+ }
+
+ c->cs_sqnum = le64_to_cpu(node->ch.sqnum);
+ dbg_mnt("commit start sqnum %llu", c->cs_sqnum);
+ }
+
+ if (snod->sqnum < c->cs_sqnum) {
+ /*
+ * This means that we reached end of log and now
+ * look to the older log data, which was already
+ * committed but the eraseblock was not erased (UBIFS
+ * only unmaps it). So this basically means we have to
+ * exit with "end of log" code.
+ */
+ err = 1;
+ goto out;
+ }
+
+ /* Make sure the first node sits at offset zero of the LEB */
+ if (snod->offs != 0) {
+ dbg_err("first node is not at zero offset");
+ goto out_dump;
+ }
+
+ list_for_each_entry(snod, &sleb->nodes, list) {
+
+ cond_resched();
+
+ if (snod->sqnum >= SQNUM_WATERMARK) {
+ ubifs_err("file system's life ended");
+ goto out_dump;
+ }
+
+ if (snod->sqnum < c->cs_sqnum) {
+ dbg_err("bad sqnum %llu, commit sqnum %llu",
+ snod->sqnum, c->cs_sqnum);
+ goto out_dump;
+ }
+
+ if (snod->sqnum > c->max_sqnum)
+ c->max_sqnum = snod->sqnum;
+
+ switch (snod->type) {
+ case UBIFS_REF_NODE: {
+ const struct ubifs_ref_node *ref = snod->node;
+
+ err = validate_ref(c, ref);
+ if (err == 1)
+ break; /* Already have this bud */
+ if (err)
+ goto out_dump;
+
+ err = add_replay_bud(c, le32_to_cpu(ref->lnum),
+ le32_to_cpu(ref->offs),
+ le32_to_cpu(ref->jhead),
+ snod->sqnum);
+ if (err)
+ goto out;
+
+ break;
+ }
+ case UBIFS_CS_NODE:
+ /* Make sure it sits at the beginning of LEB */
+ if (snod->offs != 0) {
+ ubifs_err("unexpected node in log");
+ goto out_dump;
+ }
+ break;
+ default:
+ ubifs_err("unexpected node in log");
+ goto out_dump;
+ }
+ }
+
+ if (sleb->endpt || c->lhead_offs >= c->leb_size) {
+ c->lhead_lnum = lnum;
+ c->lhead_offs = sleb->endpt;
+ }
+
+ err = !sleb->endpt;
+out:
+ ubifs_scan_destroy(sleb);
+ return err;
+
+out_dump:
+ ubifs_err("log error detected while replying the log at LEB %d:%d",
+ lnum, offs + snod->offs);
+ dbg_dump_node(c, snod->node);
+ ubifs_scan_destroy(sleb);
+ return -EINVAL;
+}
+
+/**
+ * take_ihead - update the status of the index head in lprops to 'taken'.
+ * @c: UBIFS file-system description object
+ *
+ * This function returns the amount of free space in the index head LEB or a
+ * negative error code.
+ */
+static int take_ihead(struct ubifs_info *c)
+{
+ const struct ubifs_lprops *lp;
+ int err, free;
+
+ ubifs_get_lprops(c);
+
+ lp = ubifs_lpt_lookup_dirty(c, c->ihead_lnum);
+ if (IS_ERR(lp)) {
+ err = PTR_ERR(lp);
+ goto out;
+ }
+
+ free = lp->free;
+
+ lp = ubifs_change_lp(c, lp, -1, -1, lp->flags | LPROPS_TAKEN, 0);
+ if (IS_ERR(lp)) {
+ err = PTR_ERR(lp);
+ goto out;
+ }
+
+ err = free;
+out:
+ ubifs_release_lprops(c);
+ return err;
+}
+
+/**
+ * ubifs_replay_journal - replay journal.
+ * @c: UBIFS file-system description object
+ *
+ * This function scans the journal, replays and cleans it up. It makes sure all
+ * memory data structures related to uncommitted journal are built (dirty TNC
+ * tree, tree of buds, modified lprops, etc).
+ */
+int ubifs_replay_journal(struct ubifs_info *c)
+{
+ int err, i, lnum, offs, free;
+ void *sbuf = NULL;
+
+ /* Update the status of the index head in lprops to 'taken' */
+ free = take_ihead(c);
+ if (free < 0)
+ return free; /* Error code */
+
+ if (c->ihead_offs != c->leb_size - free) {
+ ubifs_err("bad index head LEB %d:%d", c->ihead_lnum,
+ c->ihead_offs);
+ return -EINVAL;
+ }
+
+ sbuf = vmalloc(c->leb_size);
+ if (!sbuf)
+ return -ENOMEM;
+
+ dbg_mnt("start replaying the journal");
+
+ c->replaying = 1;
+
+ lnum = c->ltail_lnum = c->lhead_lnum;
+ offs = c->lhead_offs;
+
+ for (i = 0; i < c->log_lebs; i++, lnum++) {
+ if (lnum >= UBIFS_LOG_LNUM + c->log_lebs) {
+ /*
+ * The log is logically circular, we reached the last
+ * LEB, switch to the first one.
+ */
+ lnum = UBIFS_LOG_LNUM;
+ offs = 0;
+ }
+ err = replay_log_leb(c, lnum, offs, sbuf);
+ if (err == 1)
+ /* We hit the end of the log */
+ break;
+ if (err)
+ goto out;
+ offs = 0;
+ }
+
+ err = replay_buds(c);
+ if (err)
+ goto out;
+
+ err = apply_replay_tree(c);
+ if (err)
+ goto out;
+
+ ubifs_assert(c->bud_bytes <= c->max_bud_bytes || c->need_recovery);
+ dbg_mnt("finished, log head LEB %d:%d, max_sqnum %llu, "
+ "highest_inum %lu", c->lhead_lnum, c->lhead_offs, c->max_sqnum,
+ c->highest_inum);
+out:
+ destroy_replay_tree(c);
+ destroy_bud_list(c);
+ vfree(sbuf);
+ c->replaying = 0;
+ return err;
+}
--
1.5.4.1

2008-03-27 13:16:37

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 22/26] UBIFS: add extended attribute support

Extended attributes are implemented as separate inodes. This makes
it very easy to implement them and to re-use nearly all the existing
code. This might be not the fastest implementation, though. ACL support
is not implemented.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/xattr.c | 587 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 587 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c
new file mode 100644
index 0000000..85b1088
--- /dev/null
+++ b/fs/ubifs/xattr.c
@@ -0,0 +1,587 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This file implements UBIFS extended attributes support.
+ *
+ * Extended attributes are implemented as regular inodes with attached data,
+ * which limits extended attribute size to UBIFS block size (4KiB). Names of
+ * extended attributes are described by extended attribute entries (xentries),
+ * which are almost identical to directory entries, but have different key type.
+ *
+ * In other words, the situation with extended attributes is very similar to
+ * directories. Indeed, any inode (but of course not xattr inodes) may have a
+ * number of associated xentries, just like directory inodes have associated
+ * directory entries. Extended attribute entries store the name of the extended
+ * attribute, the host inode number, and the extended attribute inode number.
+ * Similarly, direntries store the name, the parent and the target inode
+ * numbers. Thus, most of the common UBIFS mechanisms may be re-used for
+ * extended attributes.
+ *
+ * The number of extended attributes is not limited, but there is Linux
+ * limitation on the maximum possible size of the list of all extended
+ * attributes associated with an inode (%XATTR_LIST_MAX), so UBIFS makes sure
+ * the sum of all extended attribute names of the inode does not exceed that
+ * limit.
+ *
+ * Extended attributes are synchronous, which means they are written to the
+ * flash media synchronously and there is no write-back for extended attribute
+ * inodes. The extended attribute values are not stored in compressed form on
+ * the media.
+ *
+ * Since extended attributes are represented by regular inodes, they are cached
+ * in the VFS inode cache. The xentries are cached in the LNC cache (see
+ * tnc.c).
+ *
+ * ACL support is not implemented.
+ */
+
+#include <linux/xattr.h>
+#include <linux/posix_acl_xattr.h>
+#include "ubifs.h"
+
+/* How many bytes an extended attribute adds to the host inode */
+#define CALC_XATTR_BYTES(data_len) ALIGN(UBIFS_INO_NODE_SZ + (data_len) + 1, 8)
+
+/*
+ * Extended attribute type constants.
+ *
+ * USER_XATTR: user extended attribute ("user.*")
+ * TRUSTED_XATTR: trusted extended attribute ("trusted.*)
+ * SECURITY_XATTR: security extended attribute ("security.*")
+ */
+enum {
+ USER_XATTR,
+ TRUSTED_XATTR,
+ SECURITY_XATTR,
+};
+
+static struct inode_operations none_inode_operations;
+static struct address_space_operations none_address_operations;
+static struct file_operations none_file_operations;
+
+/**
+ * create_xattr - create an extended attribute.
+ * @c: UBIFS file-system description object
+ * @host: host inode
+ * @nm: extended attribute name
+ * @value: extended attribute value
+ * @size: size of extended attribute value
+ *
+ * This is a helper function which creates an extended attribute of name @nm
+ * and value @value for inode @host. The host inode is also updated on flash
+ * because the ctime and extended attribute accounting data changes. This
+ * function returns zero in case of success and a negative error code in case
+ * of failure.
+ */
+static int create_xattr(struct ubifs_info *c, struct inode *host,
+ const struct qstr *nm, const void *value, int size)
+{
+ struct ubifs_inode *ui, *host_ui = ubifs_inode(host);
+ struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1,
+ .new_ino_d = size };
+ struct inode *inode;
+ int err;
+
+ /*
+ * Linux limits the maximum size of the extended attribute names list
+ * to %XATTR_LIST_MAX. This means we should not allow creating more*
+ * extended attributes if the name list becomes larger. This limitation
+ * is artificial for UBIFS, though.
+ */
+ if (host_ui->xattr_names + host_ui->xattr_cnt +
+ nm->len + 1 > XATTR_LIST_MAX)
+ return -ENOSPC;
+
+ err = ubifs_budget_inode_op(c, host, &req);
+ if (err)
+ return err;
+
+ inode = ubifs_new_inode(c, host, S_IFREG | S_IRWXUGO);
+ if (IS_ERR(inode)) {
+ err = PTR_ERR(inode);
+ goto out_budg;
+ }
+
+ /* Re-define all operations to be "nothing" */
+ inode->i_mapping->a_ops = &none_address_operations;
+ inode->i_op = &none_inode_operations;
+ inode->i_fop = &none_file_operations;
+
+ inode->i_flags |= S_SYNC | S_NOATIME | S_NOCMTIME | S_NOQUOTA;
+ ui = ubifs_inode(inode);
+ ui->xattr = 1;
+ ui->data = kmalloc(size, GFP_KERNEL);
+ if (!ui->data) {
+ err = -ENOMEM;
+ goto out_inode;
+ }
+
+ memcpy(ui->data, value, size);
+ host->i_ctime = CURRENT_TIME_SEC;
+ host_ui->xattr_cnt += 1;
+ spin_lock(&host->i_lock);
+ host_ui->xattr_size += CALC_DENT_SIZE(nm->len);
+ host_ui->xattr_size += CALC_XATTR_BYTES(size);
+ spin_unlock(&host->i_lock);
+ host_ui->xattr_names += nm->len;
+
+ /*
+ * We do not use i_size_write() because nobody can race with us as we
+ * are holding host @host->i_mutex - every xattr operation for this
+ * inode is serialized by it.
+ */
+ inode->i_size = size;
+ ui->data_len = size;
+
+ /*
+ * Note, it is important that 'ubifs_jrn_update()' writes the @host
+ * inode last, so when it gets synchronized and the write-buffer is
+ * flushed, the extended attribute is flushed as well.
+ */
+ err = ubifs_jrn_update(c, host, nm, inode, 0, IS_DIRSYNC(host), 1);
+ if (err)
+ goto out_cancel;
+
+ ubifs_release_ino_clean(c, host, &req);
+ insert_inode_hash(inode);
+ iput(inode);
+ return 0;
+
+out_cancel:
+ host_ui->xattr_cnt -= 1;
+ spin_lock(&host->i_lock);
+ host_ui->xattr_size -= CALC_DENT_SIZE(nm->len);
+ host_ui->xattr_size -= CALC_XATTR_BYTES(size);
+ spin_unlock(&host->i_lock);
+out_inode:
+ make_bad_inode(inode);
+ iput(inode);
+out_budg:
+ ubifs_cancel_ino_op(c, host, &req);
+ return err;
+}
+
+/**
+ * change_xattr - change an extended attribute.
+ * @c: UBIFS file-system description object
+ * @host: host inode
+ * @inode: extended attribute inode
+ * @value: extended attribute value
+ * @size: size of extended attribute value
+ *
+ * This helper function changes the value of extended attribute @inode with new
+ * data from @value. Returns zero in case of success and a negative error code
+ * in case of failure.
+ */
+static int change_xattr(struct ubifs_info *c, struct inode *host,
+ struct inode *inode, const void *value, int size)
+{
+ struct ubifs_inode *host_ui = ubifs_inode(host);
+ struct ubifs_inode *ui = ubifs_inode(inode);
+ struct ubifs_budget_req req = { .dirtied_ino = 1,
+ .dirtied_ino_d = ui->data_len };
+ int err;
+
+ ubifs_assert(ui->data_len == inode->i_size);
+
+ err = ubifs_budget_inode_op(c, host, &req);
+ if (err)
+ return err;
+
+ host->i_ctime = CURRENT_TIME_SEC;
+ spin_lock(&host->i_lock);
+ host_ui->xattr_size -= CALC_XATTR_BYTES(ui->data_len);
+ host_ui->xattr_size += CALC_XATTR_BYTES(size);
+ spin_unlock(&host->i_lock);
+
+ kfree(ui->data);
+ ui->data = kmalloc(size, GFP_KERNEL);
+ if (!ui->data) {
+ err = -ENOMEM;
+ goto out_budg;
+ }
+
+ memcpy(ui->data, value, size);
+ inode->i_size = size;
+ ui->data_len = size;
+
+ /*
+ * It is important to write the host inode after the xattr inode
+ * because if the host inode gets synchronized, then the extended
+ * attribute inode gets synchronized, because it goes before the host
+ * inode in the write-buffer.
+ */
+ err = ubifs_jrn_write_2_inodes(c, inode, host, IS_DIRSYNC(host));
+ if (err)
+ goto out_cancel;
+
+ ubifs_release_ino_clean(c, host, &req);
+ return 0;
+
+out_cancel:
+ spin_lock(&host->i_lock);
+ host_ui->xattr_size -= CALC_XATTR_BYTES(size);
+ host_ui->xattr_size += CALC_XATTR_BYTES(ui->data_len);
+ spin_unlock(&host->i_lock);
+ make_bad_inode(inode);
+out_budg:
+ ubifs_cancel_ino_op(c, host, &req);
+ return err;
+}
+
+/**
+ * check_namespace - check extended attribute name-space.
+ * @nm: extended attribute name
+ *
+ * This function makes sure the extended attribute name belongs to one of the
+ * supported extended attribute name-spaces. Returns name-space index in case
+ * of success and a negative error code in case of failure.
+ */
+static int check_namespace(const struct qstr *nm)
+{
+ int type;
+
+ if (nm->len > UBIFS_MAX_NLEN)
+ return -ENAMETOOLONG;
+
+ if (!strncmp(nm->name, XATTR_TRUSTED_PREFIX,
+ XATTR_TRUSTED_PREFIX_LEN)) {
+ if (nm->name[sizeof(XATTR_TRUSTED_PREFIX) - 1] == '\0')
+ return -EINVAL;
+ type = TRUSTED_XATTR;
+ } else if (!strncmp(nm->name, XATTR_USER_PREFIX,
+ XATTR_USER_PREFIX_LEN)) {
+ if (nm->name[XATTR_USER_PREFIX_LEN] == '\0')
+ return -EINVAL;
+ type = USER_XATTR;
+ } else if (!strncmp(nm->name, XATTR_SECURITY_PREFIX,
+ XATTR_SECURITY_PREFIX_LEN)) {
+ if (nm->name[sizeof(XATTR_SECURITY_PREFIX) - 1] == '\0')
+ return -EINVAL;
+ type = SECURITY_XATTR;
+ } else
+ return -EOPNOTSUPP;
+
+ return type;
+}
+
+int ubifs_setxattr(struct dentry *dentry, const char *name,
+ const void *value, size_t size, int flags)
+{
+ struct inode *inode, *host = dentry->d_inode;
+ struct ubifs_info *c = host->i_sb->s_fs_info;
+ struct qstr nm = { .name = name, .len = strlen(name) };
+ struct ubifs_dent_node *xent;
+ union ubifs_key key;
+ int err, type;
+
+ dbg_gen("xattr '%s', host ino %lu ('%.*s'), size %d", name,
+ host->i_ino, dentry->d_name.len, dentry->d_name.name, size);
+ ubifs_assert(mutex_is_locked(&host->i_mutex));
+ ubifs_assert(ubifs_inode(host)->xattr_cnt >= 0);
+ ubifs_assert(ubifs_inode(host)->xattr_size >= 0);
+ ubifs_assert(ubifs_inode(host)->xattr_msize >= 0);
+ ubifs_assert(ubifs_inode(host)->xattr_names >= 0);
+
+ if (size > UBIFS_MAX_INO_DATA)
+ return -ERANGE;
+
+ type = check_namespace(&nm);
+ if (type < 0)
+ return type;
+
+ xent = kmalloc(UBIFS_MAX_XENT_NODE_SZ, GFP_NOFS);
+ if (!xent)
+ return -ENOMEM;
+
+ /*
+ * The extended attribute entries are stored in LNC, so multiple
+ * look-ups do not involve reading the flash.
+ */
+ xent_key_init(c, &key, host->i_ino, &nm);
+ err = ubifs_tnc_lookup_nm(c, &key, xent, &nm);
+ if (err) {
+ if (err != -ENOENT)
+ goto out_free;
+
+ if (flags & XATTR_REPLACE)
+ /* We are asked not to create the xattr */
+ err = -ENODATA;
+ else
+ err = create_xattr(c, host, &nm, value, size);
+ goto out_free;
+ }
+
+ if (flags & XATTR_CREATE) {
+ /* We are asked not to replace the xattr */
+ err = -EEXIST;
+ goto out_free;
+ }
+
+ inode = ubifs_iget(c->vfs_sb, le64_to_cpu(xent->inum));
+ if (IS_ERR(inode)) {
+ ubifs_err("dead extended attribute node entry");
+ ubifs_ro_mode(c);
+ err = PTR_ERR(inode);
+ goto out_free;
+ }
+
+ err = change_xattr(c, host, inode, value, size);
+ iput(inode);
+
+out_free:
+ kfree(xent);
+ return err;
+}
+
+ssize_t ubifs_getxattr(struct dentry *dentry, const char *name, void *buf,
+ size_t size)
+{
+ struct inode *inode, *host = dentry->d_inode;
+ struct ubifs_info *c = host->i_sb->s_fs_info;
+ struct qstr nm = { .name = name, .len = strlen(name) };
+ struct ubifs_inode *ui;
+ struct ubifs_dent_node *xent;
+ union ubifs_key key;
+ int err;
+
+ dbg_gen("xattr '%s', ino %lu ('%.*s'), buf size %d", name,
+ host->i_ino, dentry->d_name.len, dentry->d_name.name, size);
+ ubifs_assert(ubifs_inode(host)->xattr_cnt >= 0);
+ ubifs_assert(ubifs_inode(host)->xattr_size >= 0);
+ ubifs_assert(ubifs_inode(host)->xattr_msize >= 0);
+ ubifs_assert(ubifs_inode(host)->xattr_names >= 0);
+
+ err = check_namespace(&nm);
+ if (err < 0)
+ return err;
+
+ xent = kmalloc(UBIFS_MAX_XENT_NODE_SZ, GFP_NOFS);
+ if (!xent)
+ return -ENOMEM;
+
+ mutex_lock(&host->i_mutex);
+ xent_key_init(c, &key, host->i_ino, &nm);
+ err = ubifs_tnc_lookup_nm(c, &key, xent, &nm);
+ if (err) {
+ if (err == -ENOENT)
+ err = -ENODATA;
+ goto out_unlock;
+ }
+
+ inode = ubifs_iget(c->vfs_sb, le64_to_cpu(xent->inum));
+ if (IS_ERR(inode)) {
+ ubifs_err("dead extended attribute node entry");
+ ubifs_ro_mode(c);
+ err = PTR_ERR(inode);
+ goto out_unlock;
+ }
+
+ ui = ubifs_inode(inode);
+ ubifs_assert(inode->i_size == ui->data_len);
+ ubifs_assert(ubifs_inode(host)->xattr_size > ui->data_len);
+
+ if (buf) {
+ /* If @buf is %NULL we are supposed to return the length */
+ if (ui->data_len > size) {
+ dbg_err("buffer size %d, xattr len %d",
+ size, ui->data_len);
+ err = -ERANGE;
+ goto out_iput;
+ }
+
+ memcpy(buf, ui->data, ui->data_len);
+ }
+ err = ui->data_len;
+
+out_iput:
+ iput(inode);
+out_unlock:
+ mutex_unlock(&host->i_mutex);
+ kfree(xent);
+ return err;
+}
+
+ssize_t ubifs_listxattr(struct dentry *dentry, char *buffer, size_t size)
+{
+ struct inode *host = dentry->d_inode;
+ struct ubifs_info *c = host->i_sb->s_fs_info;
+ struct ubifs_inode *host_ui = ubifs_inode(host);
+ union ubifs_key key;
+ struct ubifs_dent_node *xent, *pxent = NULL;
+ int err, len, written = 0;
+ struct qstr nm = { .name = NULL };
+
+ dbg_gen("ino %lu ('%.*s'), buffer size %zd", host->i_ino,
+ dentry->d_name.len, dentry->d_name.name, size);
+ ubifs_assert(host_ui->xattr_cnt >= 0);
+ ubifs_assert(host_ui->xattr_size >= 0);
+ ubifs_assert(host_ui->xattr_msize >= 0);
+ ubifs_assert(host_ui->xattr_names >= 0);
+
+ len = host_ui->xattr_names + host_ui->xattr_cnt;
+ if (!buffer)
+ /*
+ * We should return the minimum buffer size which will fit a
+ * null-terminated list of all the extended attribute names.
+ */
+ return len;
+
+ if (len > size)
+ return -ERANGE;
+
+ lowest_xent_key(c, &key, host->i_ino);
+
+ mutex_lock(&host->i_mutex);
+ while (1) {
+ int type;
+
+ xent = ubifs_tnc_next_ent(c, &key, &nm);
+ if (unlikely(IS_ERR(xent))) {
+ err = PTR_ERR(xent);
+ break;
+ }
+
+ nm.name = xent->name;
+ nm.len = le16_to_cpu(xent->nlen);
+
+ type = check_namespace(&nm);
+ if (unlikely(type < 0)) {
+ err = type;
+ break;
+ }
+
+ /* Show trusted namespace only for "power" users */
+ if (type != TRUSTED_XATTR || capable(CAP_SYS_ADMIN)) {
+ memcpy(buffer + written, nm.name, nm.len + 1);
+ written += nm.len + 1;
+ }
+
+ kfree(pxent);
+ pxent = xent;
+ key_read(c, &xent->key, &key);
+ }
+ mutex_unlock(&host->i_mutex);
+
+ kfree(pxent);
+ if (err != -ENOENT) {
+ ubifs_err("cannot find next direntry, error %d", err);
+ return err;
+ }
+
+ ubifs_assert(written <= size);
+ return written;
+}
+
+static int remove_xattr(struct ubifs_info *c, struct inode *host,
+ struct inode *inode, const struct qstr *nm)
+{
+ struct ubifs_inode *host_ui = ubifs_inode(host);
+ struct ubifs_inode *ui = ubifs_inode(inode);
+ struct ubifs_budget_req req = { .dirtied_ino = 1, .mod_dent = 1 };
+ int err;
+
+ ubifs_assert(ui->data_len == inode->i_size);
+
+ err = ubifs_budget_inode_op(c, host, &req);
+ if (err)
+ return err;
+
+ host->i_ctime = CURRENT_TIME_SEC;
+ host_ui->xattr_cnt -= 1;
+ spin_lock(&host->i_lock);
+ host_ui->xattr_size -= CALC_XATTR_BYTES(ui->data_len);
+ spin_unlock(&host->i_lock);
+ host_ui->xattr_names -= nm->len;
+
+ err = ubifs_jrn_delete_xattr(c, host, inode, nm, IS_DIRSYNC(host));
+ if (err)
+ goto out_cancel;
+
+ ubifs_release_ino_clean(c, host, &req);
+ return 0;
+
+out_cancel:
+ ubifs_cancel_ino_op(c, host, &req);
+ host_ui->xattr_cnt += 1;
+ spin_lock(&host->i_lock);
+ host_ui->xattr_size += CALC_XATTR_BYTES(ui->data_len);
+ spin_unlock(&host->i_lock);
+ make_bad_inode(inode);
+ return err;
+}
+
+int ubifs_removexattr(struct dentry *dentry, const char *name)
+{
+ struct inode *inode, *host = dentry->d_inode;
+ struct ubifs_info *c = host->i_sb->s_fs_info;
+ struct qstr nm = { .name = name, .len = strlen(name) };
+ struct ubifs_dent_node *xent;
+ union ubifs_key key;
+ int err;
+
+ dbg_gen("xattr '%s', ino %lu ('%.*s')", name,
+ host->i_ino, dentry->d_name.len, dentry->d_name.name);
+ ubifs_assert(mutex_is_locked(&host->i_mutex));
+ ubifs_assert(ubifs_inode(host)->xattr_cnt >= 0);
+ ubifs_assert(ubifs_inode(host)->xattr_size >= 0);
+ ubifs_assert(ubifs_inode(host)->xattr_msize >= 0);
+ ubifs_assert(ubifs_inode(host)->xattr_names >= 0);
+
+ err = check_namespace(&nm);
+ if (err < 0)
+ return err;
+
+ xent = kmalloc(UBIFS_MAX_XENT_NODE_SZ, GFP_NOFS);
+ if (!xent)
+ return -ENOMEM;
+
+ xent_key_init(c, &key, host->i_ino, &nm);
+ err = ubifs_tnc_lookup_nm(c, &key, xent, &nm);
+ if (err) {
+ if (err == -ENOENT)
+ err = -ENODATA;
+ goto out_free;
+ }
+
+ inode = ubifs_iget(c->vfs_sb, le64_to_cpu(xent->inum));
+ if (IS_ERR(inode)) {
+ ubifs_err("dead extended attribute node entry");
+ ubifs_ro_mode(c);
+ err = PTR_ERR(inode);
+ goto out_free;
+ }
+
+ ubifs_assert(inode->i_nlink == 1);
+ inode->i_nlink = 0;
+ err = remove_xattr(c, host, inode, &nm);
+ if (err)
+ inode->i_nlink = 1;
+
+ /* If @i_nlink is 0, 'iput()' will delete the inode */
+ iput(inode);
+
+out_free:
+ kfree(xent);
+ return err;
+}
--
1.5.4.1

2008-03-27 13:17:00

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 26/26] UBIFS: include FS to compilation

Add UBIFS to Makefile and Kbuild.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/Kconfig | 3 +
fs/Makefile | 1 +
fs/ubifs/Kconfig | 47 ++++++++++++++
fs/ubifs/Kconfig.debug | 159 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/ubifs/Makefile | 9 +++
5 files changed, 219 insertions(+), 0 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index d731282..70edf5c 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1347,6 +1347,9 @@ config JFFS2_CMODE_FAVOURLZO

endchoice

+# UBIFS File system configuration
+source "fs/ubifs/Kconfig"
+
config CRAMFS
tristate "Compressed ROM file system support (cramfs)"
depends on BLOCK
diff --git a/fs/Makefile b/fs/Makefile
index 1e7a11b..fcae06a 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -100,6 +100,7 @@ obj-$(CONFIG_NTFS_FS) += ntfs/
obj-$(CONFIG_UFS_FS) += ufs/
obj-$(CONFIG_EFS_FS) += efs/
obj-$(CONFIG_JFFS2_FS) += jffs2/
+obj-$(CONFIG_UBIFS_FS) += ubifs/
obj-$(CONFIG_AFFS_FS) += affs/
obj-$(CONFIG_ROMFS_FS) += romfs/
obj-$(CONFIG_QNX4FS_FS) += qnx4/
diff --git a/fs/ubifs/Kconfig b/fs/ubifs/Kconfig
new file mode 100644
index 0000000..21a6fae
--- /dev/null
+++ b/fs/ubifs/Kconfig
@@ -0,0 +1,47 @@
+config UBIFS_FS
+ tristate "UBIFS file system support"
+ select CRC16
+ select CRC32
+ depends on MTD_UBI
+ help
+ UBIFS is a file system for flash devices which works on top of UBI.
+
+config UBIFS_FS_XATTR
+ bool "Extended attributes support"
+ depends on UBIFS_FS
+ default n
+ help
+ This option enables support of extended attributes.
+
+config UBIFS_FS_ADVANCED_COMPR
+ bool "Advanced compression options"
+ depends on UBIFS_FS
+ default n
+ help
+ This option allows to explicitly choose which compressions, if any,
+ are enabled in UBIFS. Removing compressors means inbility to read
+ existing file systems.
+
+ If unsure, say 'N'.
+
+config UBIFS_FS_LZO
+ bool "LZO compression support" if UBIFS_FS_ADVANCED_COMPR
+ select CRYPTO
+ select CRYPTO_LZO
+ depends on UBIFS_FS
+ default y
+ help
+ LZO compressor is generally faster then zlib but compresses worse.
+ Say 'Y' if unsure.
+
+config UBIFS_FS_ZLIB
+ bool "ZLIB compression support" if UBIFS_FS_ADVANCED_COMPR
+ select CRYPTO
+ select CRYPTO_DEFLATE
+ depends on UBIFS_FS
+ default y
+ help
+ Zlib copresses better then LZO but it is slower. Say 'Y' if unsure.
+
+# Debugging-related stuff
+source "fs/ubifs/Kconfig.debug"
diff --git a/fs/ubifs/Kconfig.debug b/fs/ubifs/Kconfig.debug
new file mode 100644
index 0000000..4bfccef
--- /dev/null
+++ b/fs/ubifs/Kconfig.debug
@@ -0,0 +1,159 @@
+# UBIFS debugging configuration options, part of fs/ubifs/Kconfig
+
+config UBIFS_FS_DEBUG
+ bool "Enable debugging"
+ default n
+ depends on UBIFS_FS
+ select DEBUG_FS
+ select KALLSYMS_ALL
+ help
+ This option enables UBIFS debugging.
+
+menu "Debugging messages"
+ depends on UBIFS_FS_DEBUG
+
+config UBIFS_FS_DEBUG_MSG_GEN
+ bool "General messages"
+ default n
+ help
+ This option enables general debugging messages.
+
+config UBIFS_FS_DEBUG_MSG_JRN
+ bool "Journal messages"
+ default n
+ help
+ This option enables detailed journal debugging messages.
+
+config UBIFS_FS_DEBUG_MSG_CMT
+ bool "Commit messages"
+ default n
+ help
+ This option enables detailed journal commit debugging messages.
+
+config UBIFS_FS_DEBUG_MSG_BUDG
+ bool "Budgeting messages"
+ default n
+ help
+ This option enables detailed budgeting debugging messages.
+
+config UBIFS_FS_DEBUG_MSG_LOG
+ bool "Log messages"
+ default n
+ help
+ This option enables detailed journal log debugging messages.
+
+config UBIFS_FS_DEBUG_MSG_TNC
+ bool "Tree Node Cache (TNC) messages"
+ default n
+ help
+ This option enables detailed TNC debugging messages.
+
+config UBIFS_FS_DEBUG_MSG_LP
+ bool "LEB properties (lprops) messages"
+ default n
+ help
+ This option enables detailed lprops debugging messages.
+
+config UBIFS_FS_DEBUG_MSG_FIND
+ bool "LEB search messages"
+ default n
+ help
+ This option enables detailed LEB search debugging messages.
+
+config UBIFS_FS_DEBUG_MSG_MNT
+ bool "Mount messages"
+ default n
+ help
+ This option enables detailed mount debugging messages, including
+ recovery messages.
+
+config UBIFS_FS_DEBUG_MSG_IO
+ bool "Input/output messages"
+ default n
+ help
+ This option enables detailed I/O debugging messages.
+
+config UBIFS_FS_DEBUG_MSG_GC
+ bool "Garbage collection messages"
+ default n
+ help
+ This option enables detailed garbage collection debugging messages.
+
+config UBIFS_FS_DEBUG_MSG_SCAN
+ bool "Scan messages"
+ default n
+ help
+ This option enables detailed scan debugging messages.
+
+endmenu
+
+menu "Extra self-checks"
+ depends on UBIFS_FS_DEBUG
+
+config UBIFS_FS_DEBUG_CHK_MEMPRESS
+ bool "Create memory pressure"
+ default n
+ depends on UBIFS_FS_DEBUG
+ help
+ This option causes kernel memory pressure in order to make TNC shrinker
+ run.
+
+config UBIFS_FS_DEBUG_CHK_LPROPS
+ bool "Check LEB properties (lprops)"
+ default n
+ depends on UBIFS_FS_DEBUG
+ help
+ This option enables a function which runs during journal commit and
+ checks that the dirty and free space is correct for every LEB.
+
+config UBIFS_FS_DEBUG_CHK_TNC
+ bool "Check Tree Node Cache (TNC)"
+ default n
+ depends on UBIFS_FS_DEBUG
+ help
+ This option enables a function which runs after every
+ TNC insert / delete and checks that the TNC nodes are correct.
+
+config UBIFS_FS_DEBUG_CHK_ORPH
+ bool "Check orphan area"
+ default n
+ depends on UBIFS_FS_DEBUG
+ help
+ This option enables a function which runs during journal commit and
+ checks that the orphan area is correct.
+
+config UBIFS_FS_DEBUG_CHK_IDX_SZ
+ bool "Check indexing tree size"
+ default n
+ depends on UBIFS_FS_DEBUG
+ help
+ This option enables checking of the znode size accounting variables.
+
+config UBIFS_FS_DEBUG_CHK_OLD_IDX
+ bool "Check old indexing tree"
+ default n
+ depends on UBIFS_FS_DEBUG
+ help
+ This option enables checking of the old indexing tree which must be
+ intact to allow recovery in the event of an unclean unmount.
+
+config UBIFS_FS_DEBUG_CHK_OTHER
+ bool "Other checks"
+ default n
+ depends on UBIFS_FS_DEBUG
+ help
+ This option enables different checks which are light-weight and do not
+ affect file-system performance too much.
+
+endmenu
+
+config UBIFS_FS_DEBUG_TEST_RCVRY
+ bool "Simulate random device removal (recovery testing)"
+ default n
+ depends on UBIFS_FS_DEBUG
+ help
+ This option provides the ability to test recovery from unclean
+ unmounts. It causes UBIFS to simulate device removal. At a some
+ random point UBIFS will switch to "failure mode" after which all I/O
+ operations will fail. UBIFS can then be unmounted and mounted again
+ at which point "failure mode" is switched off and recovery ensues.
diff --git a/fs/ubifs/Makefile b/fs/ubifs/Makefile
new file mode 100644
index 0000000..6b84f0f
--- /dev/null
+++ b/fs/ubifs/Makefile
@@ -0,0 +1,9 @@
+obj-$(CONFIG_UBIFS_FS) += ubifs.o
+
+ubifs-y += shrinker.o journal.o build.o file.o dir.o super.o sb.o io.o
+ubifs-y += tnc.o master.o scan.o replay.o log.o commit.o gc.o orphan.o
+ubifs-y += budget.o find.o tnc_commit.o compress.o lpt.o lprops.o
+ubifs-y += recovery.o ioctl.o lpt_commit.o
+
+ubifs-$(CONFIG_UBIFS_FS_DEBUG) += debug.o
+ubifs-$(CONFIG_UBIFS_FS_XATTR) += xattr.o
--
1.5.4.1

2008-03-27 13:17:29

by Artem Bityutskiy

[permalink] [raw]
Subject: [RFC PATCH 09/26] UBIFS: add key helpers

This file implement various helper functions to work with UBIFS keys.
The keys are part of the UBIFS index which is a B-tree. For example,
directory entry key consists of the parent inode number and directory
entry hash.

Signed-off-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
fs/ubifs/key.h | 507 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 507 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/key.h b/fs/ubifs/key.h
new file mode 100644
index 0000000..679cb80
--- /dev/null
+++ b/fs/ubifs/key.h
@@ -0,0 +1,507 @@
+/*
+ * This file is part of UBIFS.
+ *
+ * Copyright (C) 2006-2008 Nokia Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Authors: Artem Bityutskiy (Битюцкий Артём)
+ * Adrian Hunter
+ */
+
+/*
+ * This header contains various key-related definitions and helper function.
+ * UBIFS allows several key schemes, so we access key fields only via these
+ * helpers. At the moment only one key scheme is supported.
+ *
+ * Simple key scheme
+ * ~~~~~~~~~~~~~~~~~
+ *
+ * Keys are 64-bits long. First 32-bits are inode number (parent inode number
+ * in case of direntry key). Next 3 bits are node type. The last 29 bits are
+ * 4KiB offset in case of inode node, and direntry hash in case of a direntry
+ * node. We use "r5" hash borrowed from reiserfs.
+ */
+
+#ifndef __UBIFS_KEY_H__
+#define __UBIFS_KEY_H__
+
+/**
+ * key_r5_hash - R5 hash function (borrowed from reiserfs).
+ * @s: direntry name
+ * @len: name length
+ */
+static inline uint32_t key_r5_hash(const char *s, int len)
+{
+ uint32_t a = 0;
+ const signed char *str = (const signed char *)s;
+
+ while (*str) {
+ a += *str << 4;
+ a += *str >> 4;
+ a *= 11;
+ str++;
+ }
+
+ /*
+ * We use hash values as offset in directories, so offsets 0 and 1 are
+ * reserved for "." and "..". Offset 2 is also reserved for readdir()
+ * purposes.
+ */
+ if (unlikely(a >= 0 && a <= 2))
+ a += 3;
+ return a;
+}
+
+/**
+ * key_test_hash - testing hash function.
+ * @str: direntry name
+ * @len: name length
+ */
+static inline uint32_t key_test_hash(const char *str, int len)
+{
+ uint32_t a = 0;
+
+ len = min_t(uint32_t, len, 4);
+ memcpy(&a, str, len);
+ if (unlikely(a >= 0 && a <= 2))
+ a += 3;
+ return a;
+}
+
+/**
+ * ino_key_init - initialize inode key.
+ * @c: UBIFS file-system description object
+ * @key: key to initialize
+ * @inum: inode number
+ */
+static inline void ino_key_init(const struct ubifs_info *c,
+ union ubifs_key *key, ino_t inum)
+{
+ key->u32[0] = inum;
+ key->u32[1] = UBIFS_INO_KEY << 29;
+}
+
+/**
+ * ino_key_init_flash - initialize on-flash inode key.
+ * @c: UBIFS file-system description object
+ * @k: key to initialize
+ * @inum: inode number
+ */
+static inline void ino_key_init_flash(const struct ubifs_info *c, void *k,
+ ino_t inum)
+{
+ union ubifs_key *key = k;
+
+ key->j32[0] = cpu_to_le32(inum);
+ key->j32[1] = cpu_to_le32(UBIFS_INO_KEY << 29);
+ memset(k + 8, 0, UBIFS_MAX_KEY_LEN - 8);
+}
+
+/**
+ * lowest_ino_key - get the lowest possible inode key.
+ * @c: UBIFS file-system description object
+ * @key: key to initialize
+ * @inum: inode number
+ */
+static inline void lowest_ino_key(const struct ubifs_info *c,
+ union ubifs_key *key, ino_t inum)
+{
+ key->u32[0] = inum;
+ key->u32[1] = 0;
+}
+
+/**
+ * highest_ino_key - get the highest possible inode key.
+ * @c: UBIFS file-system description object
+ * @key: key to initialize
+ * @inum: inode number
+ */
+static inline void highest_ino_key(const struct ubifs_info *c,
+ union ubifs_key *key, ino_t inum)
+{
+ key->u32[0] = inum;
+ key->u32[1] = 0xffffffff;
+}
+
+/**
+ * dent_key_init - initialize directory entry key.
+ * @c: UBIFS file-system description object
+ * @key: key to initialize
+ * @inum: parent inode number
+ * @nm: direntry name and length
+ */
+static inline void dent_key_init(const struct ubifs_info *c,
+ union ubifs_key *key, ino_t inum,
+ const struct qstr *nm)
+{
+ uint32_t hash = c->key_hash(nm->name, nm->len);
+
+ key->u32[0] = inum;
+ key->u32[1] = (hash & 0x01FFFFFF) | (UBIFS_DENT_KEY << 29);
+}
+
+/**
+ * dent_key_init_hash - initialize directory entry key without re-calculating
+ * hash function.
+ * @c: UBIFS file-system description object
+ * @key: key to initialize
+ * @inum: parent inode number
+ * @hash: direntry name hash
+ */
+static inline void dent_key_init_hash(const struct ubifs_info *c,
+ union ubifs_key *key, ino_t inum,
+ uint32_t hash)
+{
+ key->u32[0] = inum;
+ key->u32[1] = (hash & 0x01FFFFFF) | (UBIFS_DENT_KEY << 29);
+}
+
+/**
+ * dent_key_init_flash - initialize on-flash directory entry key.
+ * @c: UBIFS file-system description object
+ * @k: key to initialize
+ * @inum: parent inode number
+ * @nm: direntry name and length
+ */
+static inline void dent_key_init_flash(const struct ubifs_info *c, void *k,
+ ino_t inum, const struct qstr *nm)
+{
+ union ubifs_key *key = k;
+ uint32_t hash = c->key_hash(nm->name, nm->len);
+
+ key->j32[0] = cpu_to_le32(inum);
+ key->j32[1] = cpu_to_le32((hash & 0x01FFFFFF) | (UBIFS_DENT_KEY << 29));
+ memset(k + 8, 0, UBIFS_MAX_KEY_LEN - 8);
+}
+
+/**
+ * lowest_dent_key - get the lowest possible directory entry key.
+ * @c: UBIFS file-system description object
+ * @key: where to store the lowest key
+ * @inum: parent inode number
+ */
+static inline void lowest_dent_key(const struct ubifs_info *c,
+ union ubifs_key *key, ino_t inum)
+{
+ key->u32[0] = inum;
+ key->u32[1] = UBIFS_DENT_KEY << 29;
+}
+
+/**
+ * xent_key_init - initialize extended attribute entry key.
+ * @c: UBIFS file-system description object
+ * @key: key to initialize
+ * @inum: host inode number
+ * @nm: extended attribute entry name and length
+ */
+static inline void xent_key_init(const struct ubifs_info *c,
+ union ubifs_key *key, ino_t inum,
+ const struct qstr *nm)
+{
+ uint32_t hash = c->key_hash(nm->name, nm->len);
+
+ key->u32[0] = inum;
+ key->u32[1] = (hash & 0x01FFFFFF) | (UBIFS_XENT_KEY << 29);
+}
+
+/**
+ * xent_key_init_hash - initialize extended attribute entry key without
+ * re-calculating hash function.
+ * @c: UBIFS file-system description object
+ * @key: key to initialize
+ * @inum: host inode number
+ * @hash: extended attribute entry name hash
+ */
+static inline void xent_key_init_hash(const struct ubifs_info *c,
+ union ubifs_key *key, ino_t inum,
+ uint32_t hash)
+{
+ key->u32[0] = inum;
+ key->u32[1] = (hash & 0x01FFFFFF) | (UBIFS_XENT_KEY << 29);
+}
+
+/**
+ * xent_key_init_flash - initialize on-flash extended attribute entry key.
+ * @c: UBIFS file-system description object
+ * @k: key to initialize
+ * @inum: host inode number
+ * @nm: extended attribute entry name and length
+ */
+static inline void xent_key_init_flash(const struct ubifs_info *c, void *k,
+ ino_t inum, const struct qstr *nm)
+{
+ union ubifs_key *key = k;
+ uint32_t hash = c->key_hash(nm->name, nm->len);
+
+ key->j32[0] = cpu_to_le32(inum);
+ key->j32[1] = cpu_to_le32((hash & 0x01FFFFFF) | (UBIFS_XENT_KEY << 29));
+ memset(k + 8, 0, UBIFS_MAX_KEY_LEN - 8);
+}
+
+/**
+ * lowest_xent_key - get the lowest possible extended attribute entry key.
+ * @c: UBIFS file-system description object
+ * @key: where to store the lowest key
+ * @inum: host inode number
+ */
+static inline void lowest_xent_key(const struct ubifs_info *c,
+ union ubifs_key *key, ino_t inum)
+{
+ key->u32[0] = inum;
+ key->u32[1] = UBIFS_XENT_KEY << 29;
+}
+
+/**
+ * data_key_init - initialize data key.
+ * @c: UBIFS file-system description object
+ * @key: key to initialize
+ * @inum: inode number
+ * @block: block number
+ */
+static inline void data_key_init(const struct ubifs_info *c,
+ union ubifs_key *key, ino_t inum,
+ unsigned int block)
+{
+ key->u32[0] = inum;
+ key->u32[1] = (block & 0x01FFFFFF) | (UBIFS_DATA_KEY << 29);
+}
+
+/**
+ * data_key_init_flash - initialize on-flash data key.
+ * @c: UBIFS file-system description object
+ * @k: key to initialize
+ * @inum: inode number
+ * @block: block number
+ */
+static inline void data_key_init_flash(const struct ubifs_info *c, void *k,
+ ino_t inum, unsigned int block)
+{
+ union ubifs_key *key = k;
+
+ key->j32[0] = cpu_to_le32(inum);
+ key->j32[1] = cpu_to_le32((block & 0x01FFFFFF) |
+ (UBIFS_DATA_KEY << 29));
+ memset(k + 8, 0, UBIFS_MAX_KEY_LEN - 8);
+}
+
+/**
+ * trun_key_init - initialize truncation node key.
+ * @c: UBIFS file-system description object
+ * @key: key to initialize
+ * @inum: inode number
+ */
+static inline void trun_key_init(const struct ubifs_info *c,
+ union ubifs_key *key, ino_t inum)
+{
+ key->u32[0] = inum;
+ key->u32[1] = UBIFS_TRUN_KEY << 29;
+}
+
+/**
+ * trun_key_init_flash - initialize on-flash truncation node key.
+ * @c: UBIFS file-system description object
+ * @k: key to initialize
+ * @inum: inode number
+ */
+static inline void trun_key_init_flash(const struct ubifs_info *c, void *k,
+ ino_t inum)
+{
+ union ubifs_key *key = k;
+
+ key->j32[0] = cpu_to_le32(inum);
+ key->j32[1] = cpu_to_le32(UBIFS_TRUN_KEY << 29);
+ memset(k + 8, 0, UBIFS_MAX_KEY_LEN - 8);
+}
+
+/**
+ * key_type - get key type.
+ * @c: UBIFS file-system description object
+ * @key: key to get type of
+ */
+static inline int key_type(const struct ubifs_info *c,
+ const union ubifs_key *key)
+{
+ return key->u32[1] >> 29;
+}
+
+/**
+ * key_type_flash - get type of a on-flash formatted key.
+ * @c: UBIFS file-system description object
+ * @k: key to get type of
+ */
+static inline int key_type_flash(const struct ubifs_info *c, const void *k)
+{
+ const union ubifs_key *key = k;
+
+ return le32_to_cpu(key->u32[1]) >> 29;
+}
+
+/**
+ * key_ino - fetch inode number from key.
+ * @c: UBIFS file-system description object
+ * @k: key to fetch inode number from
+ */
+static inline ino_t key_ino(const struct ubifs_info *c, const void *k)
+{
+ const union ubifs_key *key = k;
+
+ return key->u32[0];
+}
+
+/**
+ * key_ino_flash - fetch inode number from an on-flash formatted key.
+ * @c: UBIFS file-system description object
+ * @k: key to fetch inode number from
+ */
+static inline ino_t key_ino_flash(const struct ubifs_info *c, const void *k)
+{
+ const union ubifs_key *key = k;
+
+ return le32_to_cpu(key->j32[0]);
+}
+
+/**
+ * key_hash - get directory entry hash.
+ * @c: UBIFS file-system description object
+ * @key: the key to get hash from
+ */
+static inline int key_hash(const struct ubifs_info *c,
+ const union ubifs_key *key)
+{
+ return key->u32[1] & 0x01FFFFFF;
+}
+
+/**
+ * key_hash_flash - get directory entry hash from an on-flash formatted key.
+ * @c: UBIFS file-system description object
+ * @k: the key to get hash from
+ */
+static inline int key_hash_flash(const struct ubifs_info *c, const void *k)
+{
+ const union ubifs_key *key = k;
+
+ return le32_to_cpu(key->j32[1]) & 0x01FFFFFF;
+}
+
+/**
+ * key_block - get data block number.
+ * @c: UBIFS file-system description object
+ * @key: the key to get the block number from
+ */
+static inline unsigned int key_block(const struct ubifs_info *c,
+ const union ubifs_key *key)
+{
+ return key->u32[1] & 0x01FFFFFF;
+}
+
+/**
+ * key_read - transform a key to in-memory format.
+ * @c: UBIFS file-system description object
+ * @from: the key to transform
+ * @to: the key to store the result
+ */
+static inline void key_read(const struct ubifs_info *c, const void *from,
+ union ubifs_key *to)
+{
+ const union ubifs_key *f = from;
+
+ to->u32[0] = le32_to_cpu(f->j32[0]);
+ to->u32[1] = le32_to_cpu(f->j32[1]);
+}
+
+/**
+ * key_write - transform a key from in-memory format.
+ * @c: UBIFS file-system description object
+ * @from: the key to transform
+ * @to: the key to store the result
+ */
+static inline void key_write(const struct ubifs_info *c,
+ const union ubifs_key *from, void *to)
+{
+ union ubifs_key *t = to;
+
+ t->j32[0] = cpu_to_le32(from->u32[0]);
+ t->j32[1] = cpu_to_le32(from->u32[1]);
+ memset(to + 8, 0, UBIFS_MAX_KEY_LEN - 8);
+}
+
+/**
+ * key_write_idx - transform a key from in-memory format for the index.
+ * @c: UBIFS file-system description object
+ * @from: the key to transform
+ * @to: the key to store the result
+ */
+static inline void key_write_idx(const struct ubifs_info *c,
+ const union ubifs_key *from, void *to)
+{
+ union ubifs_key *t = to;
+
+ t->j32[0] = cpu_to_le32(from->u32[0]);
+ t->j32[1] = cpu_to_le32(from->u32[1]);
+}
+
+/**
+ * key_copy - copy a key.
+ * @c: UBIFS file-system description object
+ * @from: the key to copy from
+ * @to: the key to copy to
+ */
+static inline void key_copy(const struct ubifs_info *c,
+ const union ubifs_key *from, union ubifs_key *to)
+{
+ to->u64[0] = from->u64[0];
+}
+
+/**
+ * keys_cmp - compare keys.
+ * @c: UBIFS file-system description object
+ * @key1: the first key to compare
+ * @key2: the second key to compare
+ *
+ * This function compares 2 keys and returns %-1 if @key1 is less then
+ * @key2, 0 if the keys are equivalent and %1 if @key1 is greater then @key2.
+ */
+static inline int keys_cmp(const struct ubifs_info *c,
+ const union ubifs_key *key1,
+ const union ubifs_key *key2)
+{
+ int i;
+
+ for (i = 0; i < 2; i++) {
+ if (key1->u32[i] < key2->u32[i])
+ return -1;
+ if (key1->u32[i] > key2->u32[i])
+ return 1;
+ }
+
+ return 0;
+}
+
+/**
+ * is_hash_key - is a key vulnerable to hash collisions.
+ * @c: UBIFS file-system description object
+ * @key: key
+ *
+ * This function returns %1 if @key is a hashed key or %0 otherwise.
+ */
+static inline int is_hash_key(const struct ubifs_info *c,
+ const union ubifs_key *key)
+{
+ int type = key_type(c, key);
+
+ return type == UBIFS_DENT_KEY || type == UBIFS_XENT_KEY;
+}
+
+#endif /* !__UBIFS_KEY_H__ */
--
1.5.4.1

2008-03-27 13:37:15

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC PATCH 20/26] UBIFS: add VFS operations

Artem Bityutskiy <[email protected]> writes:

(haven't read the whole thing)

> + inode->i_gid = current->fsgid;
> + inode->i_mode = mode;
> + inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC;

Any specific reason you didn't implement sub second time stamp support?
There is really no good excuse to not do that on a new file system.

-Andi

2008-03-27 13:46:08

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [RFC PATCH 20/26] UBIFS: add VFS operations

Andi Kleen wrote:
>> + inode->i_gid = current->fsgid;
>> + inode->i_mode = mode;
>> + inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC;
>
> Any specific reason you didn't implement sub second time stamp support?
> There is really no good excuse to not do that on a new file system.

No reason, just thought this should be enough. Will be fixed, thank you.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

2008-03-27 16:20:22

by Josh Boyer

[permalink] [raw]
Subject: Re: [RFC PATCH] UBIFS - new flash file system

On Thu, 2008-03-27 at 16:55 +0200, Artem Bityutskiy wrote:
> Dear community,
>
> here is a new flash file system developed by Nokia engineers with
> help of the University of Szeged. The new file-system is called
> UBIFS, which stands for UBI file system. UBI is the wear-leveling/
> bad-block handling/volume management layer which is already in
> mainline (see drivers/mtd/ubi).

As a suggestion, take everything below this paragraph and above the
diffstat in your original email and throw it in
Documentation/filesystems/ubifs.txt

josh

2008-03-28 06:20:42

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [RFC PATCH] UBIFS - new flash file system

Josh Boyer wrote:
> As a suggestion, take everything below this paragraph and above the
> diffstat in your original email and throw it in
> Documentation/filesystems/ubifs.txt

Sure, thanks.

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

2008-03-28 06:51:04

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [RFC PATCH] UBIFS - new flash file system

There was a typo, let me fix it.

Artem Bityutskiy wrote:
> Note, UBIFS works on top of UBI, not on top of bare flash devices.
> It delegates crucial things like garbage-collection and bad
s/garbage-collection/wear-leveling/. Of course GC is done on the FS level :-)

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

2008-03-28 10:13:34

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC PATCH 05/26] UBIFS: add file-system build

On Thu, 27 Mar 2008 16:55:25 +0200 Artem Bityutskiy <[email protected]> wrote:

> +static int init_constants_late(struct ubifs_info *c)
> +{
> + int tmp, err;
> + long long tmp64;
> +
> + c->main_bytes = c->main_lebs * c->leb_size;
> +
> + c->max_znode_sz = sizeof(struct ubifs_znode) +
> + c->fanout * sizeof(struct ubifs_zbranch);
> +
> + tmp = ubifs_idx_node_sz(c, 1);
> + c->ranges[UBIFS_IDX_NODE].min_len = tmp;
> + c->min_idx_node_sz = ALIGN(tmp, 8);
> +
> + tmp = ubifs_idx_node_sz(c, c->fanout);
> + c->ranges[UBIFS_IDX_NODE].max_len = tmp;
> + c->max_idx_node_sz = ALIGN(tmp, 8);
> +
> + /* Make sure LEB size is large enough to fit full commit */
> + tmp = UBIFS_CS_NODE_SZ + UBIFS_REF_NODE_SZ * c->jhead_cnt;
> + tmp = ALIGN(tmp, c->min_io_size);
> + if (tmp > c->leb_size) {
> + dbg_err("too small LEB size %d, at least %d needed",
> + c->leb_size, tmp);
> + return -EINVAL;
> + }
> +
> + /*
> + * Make sure that the log is large enough to fit reference nodes for
> + * all buds plus one reserved LEB.
> + */
> + tmp64 = c->max_bud_bytes;
> + tmp = do_div(tmp64, c->leb_size);

do_div() operates on u64, not signed long long. This will warn on several
architectures.

2008-03-28 11:08:17

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [RFC PATCH 05/26] UBIFS: add file-system build

Andrew Morton wrote:
> On Thu, 27 Mar 2008 16:55:25 +0200 Artem Bityutskiy <[email protected]> wrote:
>
>> + /*
>> + * Make sure that the log is large enough to fit reference nodes for
>> + * all buds plus one reserved LEB.
>> + */
>> + tmp64 = c->max_bud_bytes;
>> + tmp = do_div(tmp64, c->leb_size);
>
> do_div() operates on u64, not signed long long. This will warn on several
> architectures.

Will be fixed, thank you!

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

2008-03-31 12:30:14

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [RFC PATCH] UBIFS - new flash file system


On Thursday 2008-03-27 15:55, Artem Bityutskiy wrote:
>
> here is a new flash file system developed by Nokia engineers with
> help of the University of Szeged. The new file-system is called
> UBIFS, which stands for UBI file system. UBI is the wear-leveling/
> bad-block handling/volume management layer which is already in
> mainline (see drivers/mtd/ubi).
>[...]

And how does it compare to logfs?

2008-03-31 12:52:51

by Adrian Hunter

[permalink] [raw]
Subject: Re: [RFC PATCH] UBIFS - new flash file system

Jan Engelhardt wrote:
>
> On Thursday 2008-03-27 15:55, Artem Bityutskiy wrote:
>>
>> here is a new flash file system developed by Nokia engineers with
>> help of the University of Szeged. The new file-system is called
>> UBIFS, which stands for UBI file system. UBI is the wear-leveling/
>> bad-block handling/volume management layer which is already in
>> mainline (see drivers/mtd/ubi).
>> [...]
>
> And how does it compare to logfs?

We don't know a lot about logfs, so you will really have to make
your own comparison. However our general impressions are as follows:

1. In our testing logfs file operations seem to be much slower,
see http://osl.sed.hu/wiki/ubifs/index.php/IOzone

2. logfs code base is much smaller i.e. UBIFS has 3-4 times as many
lines of code.

3. logfs does not seem to have bad-block handling.

4. logfs does not seem to have wear-leveling.

5. We are not certain how scalable logfs is.

We could be wrong about those things - don't flame us if we are.
Ask us about UBIFS, not logfs.

2008-03-31 13:21:14

by Jörn Engel

[permalink] [raw]
Subject: Re: [RFC PATCH] UBIFS - new flash file system

On Mon, 31 March 2008 15:47:05 +0300, Adrian Hunter wrote:
> >
> >And how does it compare to logfs?
>
> We don't know a lot about logfs, so you will really have to make
> your own comparison. However our general impressions are as follows:
>
> 1. In our testing logfs file operations seem to be much slower,
> see http://osl.sed.hu/wiki/ubifs/index.php/IOzone

Shiny numbers! Performance has improved significantly in the last six
month. Still worth a closer look.

> 3. logfs does not seem to have bad-block handling.

Bad blocks at mkfs time are handled, blocks turning bad later on aren't
yet.

> 4. logfs does not seem to have wear-leveling.

It does.

Jörn

--
Fools ignore complexity. Pragmatists suffer it.
Some can avoid it. Geniuses remove it.
-- Perlis's Programming Proverb #58, SIGPLAN Notices, Sept. 1982

2008-03-31 13:40:30

by Jörn Engel

[permalink] [raw]
Subject: Re: [RFC PATCH] UBIFS - new flash file system

On Mon, 31 March 2008 14:29:59 +0200, Jan Engelhardt wrote:
> On Thursday 2008-03-27 15:55, Artem Bityutskiy wrote:
> >
> >here is a new flash file system developed by Nokia engineers with
> >help of the University of Szeged. The new file-system is called
> >UBIFS, which stands for UBI file system. UBI is the wear-leveling/
> >bad-block handling/volume management layer which is already in
> >mainline (see drivers/mtd/ubi).
> >[...]
>
> And how does it compare to logfs?

Both share similar design goals. Biggest difference is that ubifs works
on top of ubi and depends on ubi support, while logfs works on plain mtd
(or block devices) and does everything itself.

Code size difference is huge. Ubi weighs some 11kloc, ubifs some 30,
logfs some 8.

Ubi scales linearly, as it does a large scan at init time. It is still
reasonably fast, as it reads just a few bytes worth of header per block.
Logfs mounts in O(1) but will currently become mindbogglingly slow when
the filesystem nears 100% full and write are purely random. Not that
any other flash filesystem would perform well under these conditions -
it is the known worst case scenario.

Jörn

--
Victory in war is not repetitious.
-- Sun Tzu

2008-03-31 14:05:25

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [RFC PATCH] UBIFS - new flash file system

Jörn Engel wrote:
>> 3. logfs does not seem to have bad-block handling.
>
> Bad blocks at mkfs time are handled, blocks turning bad later on aren't
> yet.

I personally refuse to compare a finished FS with handles all the
crucial flash features to a non-finished FS. It just makes no sense.

LogFS was talked about back 2005 in Linux Kongress [1], but is not
finished yet. Let's talk about it when it is production ready.

[1]. http://www.linux-kongress.org/2005/abstracts.html#4_4_2

--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

2008-03-31 17:18:20

by Jörn Engel

[permalink] [raw]
Subject: Re: [RFC PATCH] UBIFS - new flash file system

On Mon, 31 March 2008 17:00:03 +0300, Artem Bityutskiy wrote:
>
> I personally refuse to compare a finished FS with handles all the
> crucial flash features to a non-finished FS. It just makes no sense.

Noone is forcing you.

Jörn

--
All models are wrong. Some models are useful.
-- George Box

2008-03-31 20:49:24

by Pekka Enberg

[permalink] [raw]
Subject: Re: [RFC PATCH] UBIFS - new flash file system

Hi,

On Mon, 31 March 2008 17:00:03 +0300, Artem Bityutskiy wrote:
> > I personally refuse to compare a finished FS with handles all the
> > crucial flash features to a non-finished FS. It just makes no sense.

On Mon, Mar 31, 2008 at 8:17 PM, J?rn Engel <[email protected]> wrote:
> Noone is forcing you.

There are some of us that are interested to know why we want UBIFS in
the mainline rather than wait for LogFS or some other variant to
appear though.

Pekka

2008-03-31 21:01:15

by Pekka Enberg

[permalink] [raw]
Subject: Re: [RFC PATCH 25/26] UBIFS: add debugging stuff

Hi Artem,

On Thu, Mar 27, 2008 at 5:55 PM, Artem Bityutskiy
<[email protected]> wrote:
> The UBIFS code is large, and we have a plenty of debugging stuff
> in there which helps to catch bugs. Some of the debugging stuff
> will be deleted later.

Yes please. The code is somewhat noisy on the debugging side.

On Thu, Mar 27, 2008 at 5:55 PM, Artem Bityutskiy
<[email protected]> wrote:
> +void *dbg_kmalloc(size_t size, gfp_t flags)
> +
> +void *dbg_kzalloc(size_t size, gfp_t flags)
> +
> +void dbg_kfree(const void *addr)
> +
> +void *dbg_vmalloc(size_t size)
> +
> +void dbg_vfree(void *addr)
> +
> +void dbg_leak_report(void)

Not acceptable for mainline kernel. SLAB already provides leak
detection and it should be straight-forward to port over to SLUB too.

> +/*
> + * struct eaten_memory - memory object eaten by UBIFS to cause memory pressure.
> + * @list: link in the list of eaten memory objects
> + * @pad: just pads to memory page size
> + */
> +struct eaten_memory {
> + struct list_head list;
> + uint8_t pad[PAGE_CACHE_SIZE - sizeof(struct list_head)];
> +};

If you need this, please make it a standalone module in mm/.

> +void dbg_eat_memory(void)
> +{
> + struct eaten_memory *em;
> +
> + em = kmalloc(sizeof(struct eaten_memory), GFP_NOFS);

It's probably better to use the page allocator for this.

> +#ifdef CONFIG_UBIFS_FS_DEBUG
> +#define UBIFS_DBG(op) op
> +#define ubifs_assert(expr) do { \
> +
> +/* Generic debugging message */
> +#define dbg_msg(fmt, ...) do { \
> +
> +/* Debugging message which prints UBIFS key */
> +#define dbg_key(c, key, fmt, ...) do { \
> +
> +#define dbg_err(fmt, ...) ubifs_err(fmt, ##__VA_ARGS__)
> +#define dbg_dump_stack() dump_stack()

Please kill these wrappers and use BUG_ON, WARN_ON, and printk() where
appropriate.

2008-03-31 21:22:11

by Jörn Engel

[permalink] [raw]
Subject: Re: [RFC PATCH] UBIFS - new flash file system

On Mon, 31 March 2008 23:49:16 +0300, Pekka Enberg wrote:
>
> There are some of us that are interested to know why we want UBIFS in
> the mainline rather than wait for LogFS or some other variant to
> appear though.

You don't have to wait long. I was thinking about sending a patch out
tomorrow.

And I don't believe it has to be a choice. There is little reason
against merging both - apart from any problems found in the review.

Also, competition is a good thing. There's nothing like a flurry of
patches following an unfavorable benchmark for one side or the other. ;)

Jörn

--
Fancy algorithms are slow when n is small, and n is usually small.
Fancy algorithms have big constants. Until you know that n is
frequently going to be big, don't get fancy.
-- Rob Pike

2008-04-01 02:12:11

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC PATCH 19/26] UBIFS: add Garbage Collector

On Thursday 27 March 2008, Artem Bityutskiy wrote:
> + * Note, if the file-system is close to be full, this function may return
> + * %-EAGAIN infinitely, so the caller has to limit amount of re-invocations of
> + * the function. E.g., this happens if the limits on the journal size are too
> + * tough and GC writes too much to the journal before an LEB is freed. This
> + * might also mean that the journal is too large, and the TNC becomes to big,
> + * so that the shrinker is constantly called, finds not clean znodes to free,
> + * and requests commit. Well, this may also happen if the journal is all right,
> + * but another kernel process consumes too much memory. Anyway, infinite
> + * %-EAGAIN may happen, but in some extreme/misconfiguration cases.

This comment sounds a little bit scary, but that may only be because I don't
understand the worst-case scenario.

Why can't you guarantee that there is always enough space to successfully
run GC, e.g. by reserving some space that can never be used by file data?

More importantly, if you get into the situation that the GC doesn't make
forward progress any more, can you guarantee that it is always possible
for the user to delete files in order to make space again? Or can you
get an -ENOSPC on unlink in that case?

Arnd <><