The following patch set implements UBIVIS (checkpointing) support for
UBI.
Changes since v1:
- renamed it to UBIVIS (at least in Kconfig)
- UBIVIS parameters are now configurable via Kconfig
- several bugs have been fixed (design and implementation bugs)
- added lots of comments to make the review process easier
- made checkpatch.pl happy
Currently I'm testing UBIVIS on different workloads to find bugs
and good configuration parameters (mostly for the pool size).
So, expect v3 soon! :-)
Description:
Checkpointing is an optional feature which stores the physical to
logical eraseblock relations in a checkpointing superblock to reduce
the initialization time of UBI. The current init time of UBI is
proportional to the number of physical erase blocks on the FLASH
device. With checkpointing enabled the scan time is limited to a fixed
number of blocks.
Checkpointing does not affect any of the existing UBI robustness
features and in fact the checkpointing code falls back into scanning
mode when the checkpoint superblock(s) are corrupted.
The checkpoints consist of two elements:
1) A primary checkpoint block, which contains merily a pointer to the
erase block(s) which hold the real checkpointing data.
This primary block is guaranteed to be held within the first N
eraseblocks of a device. N is momentarily set to 64, this value can
be changed via Kconfig.
2) The secondary checkpoint blocks, which contain the real
checkpointing data (physical to logical eraseblock relations,
erase counts, sequence numbers ...)
Aside of that the checkpointing data contains a list of blocks
which belong to the active working pool. The active working pool is
a fixed number of blocks for shortterm, longterm and unknown
storage time, which can be modified before the next checkpoint set
is written to FLASH. These blocks need to be scanned in the
conventional UBI scan mode.
The reason for these pool blocks is to reduce the checkpoint
updates to the necessary minimum to avoid accelerated device
wearout in scenarios where data changes rapidly. The checkpoint
data is updated whenever a working pool runs out of blocks.
The number of pool blocks can be defined with a config option at
the moment, but this could also be done at runtime via sysfs. In
case of a change the checkpointing data would be reconstructed.
So the checkpoint scan consists of the following steps:
1) Find the primary checkpoint block by scanning the start of the
device.
2) Read the real checkpoint data and construct the UBI device info
structures.
3) Scan the pool blocks.
All these operations scan a limited number of erase blocks which makes
the UBI init O(1) and independent of the device size.
The checkpoint functionality is fully compatible with existing UBI
deployments. If no checkpoint blocks can be found then the device is
scanned and the checkpoint blocks are created from the scanned
information.
Aside of review and testing it needs to be decided, whether the number
of pool blocks should be deduced from the device size (number of
physical eraseblocks) or made configurable at compile or runtime.
Thanks to the folks at CELF who sponsored this work!
[PATCH 1/7] [RFC] UBI: Add checkpoint on-chip layout
[PATCH 2/7] [RFC] UBI: Add checkpoint struct to ubi_device
[PATCH 3/7] [RFC] UBI: Export next_sqnum()
[PATCH 4/7] [RFC] UBI: Export compare_lebs()
[PATCH 5/7] [RFC] UBI: Make wl subsystem checkpoint aware
[PATCH 6/7] [RFC] UBI: Implement checkpointing support
[PATCH 7/7] [RFC] UBI: wire up checkpointing
Thanks,
//richard
struct ubi_checkpoint describes the currently used checkpoint.
Upon checkpoint recreation all currently used PEBs will be returned
to the wl subsystem.
struct ubi_cp_pool describes a checkpoint pool.
All PEBs within this pool have to be rescanned after reading the checkpoint.
The pools are needed to handle the three types of PEBs which can be obtained
from the wl subsystem.
Longterm, shortterm and unknown.
Signed-off-by: Richard Weinberger <[email protected]>
---
drivers/mtd/ubi/ubi.h | 43 ++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 42 insertions(+), 1 deletions(-)
diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
index b162790..83fd308 100644
--- a/drivers/mtd/ubi/ubi.h
+++ b/drivers/mtd/ubi/ubi.h
@@ -196,6 +196,41 @@ struct ubi_rename_entry {
struct ubi_volume_desc;
+#ifdef CONFIG_MTD_UBI_CHECKPOINT
+/**
+ * struct ubi_checkpoint - in-memory checkpoint data structure.
+ * @peb: PEBs used by the current checkpoint
+ * @ec: the erase counter of each used PEB
+ * @size: size of the checkpoint in bytes
+ * @used_blocks: number of used PEBs
+ */
+struct ubi_checkpoint {
+ int peb[UBI_CP_MAX_BLOCKS];
+ unsigned int ec[UBI_CP_MAX_BLOCKS];
+ size_t size;
+ int used_blocks;
+};
+
+/**
+ * struct ubi_cp_pool - in-memory checkpoint pool
+ * @pebs: PEBs in this pool
+ * @used: number of used PEBs
+ * @size: total number of PEBs in this pool
+ * @max_size: maximal size of the pool
+ *
+ * A pool gets filled with up to max_size.
+ * If all PEBs within the pool are used a new checkpoint
+ * will be written and the pool gets refilled with empty PEBs.
+ *
+ */
+struct ubi_cp_pool {
+ int pebs[UBI_CP_MAX_POOL_SIZE];
+ int used;
+ int size;
+ int max_size;
+};
+#endif
+
/**
* struct ubi_volume - UBI volume description data structure.
* @dev: device object to make use of the the Linux device model
@@ -424,7 +459,13 @@ struct ubi_device {
spinlock_t ltree_lock;
struct rb_root ltree;
struct mutex alc_mutex;
-
+#ifdef CONFIG_MTD_UBI_CHECKPOINT
+ struct ubi_checkpoint *cp;
+ struct ubi_checkpoint *old_cp;
+ struct ubi_cp_pool long_pool;
+ struct ubi_cp_pool short_pool;
+ struct ubi_cp_pool unk_pool;
+#endif
/* Wear-leveling sub-system's stuff */
struct rb_root used;
struct rb_root erroneous;
--
1.7.6.5
The checkpointing subsystem needs to read the next sequence nummber
directly.
Signed-off-by: Richard Weinberger <[email protected]>
---
drivers/mtd/ubi/eba.c | 18 +++++++++---------
drivers/mtd/ubi/ubi.h | 1 +
2 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c
index 2455d62..6c99e43 100644
--- a/drivers/mtd/ubi/eba.c
+++ b/drivers/mtd/ubi/eba.c
@@ -57,7 +57,7 @@
* global sequence counter value. It also increases the global sequence
* counter.
*/
-static unsigned long long next_sqnum(struct ubi_device *ubi)
+unsigned long long ubi_next_sqnum(struct ubi_device *ubi)
{
unsigned long long sqnum;
@@ -522,7 +522,7 @@ retry:
goto out_put;
}
- vid_hdr->sqnum = cpu_to_be64(next_sqnum(ubi));
+ vid_hdr->sqnum = cpu_to_be64(ubi_next_sqnum(ubi));
err = ubi_io_write_vid_hdr(ubi, new_pnum, vid_hdr);
if (err)
goto write_error;
@@ -634,7 +634,7 @@ int ubi_eba_write_leb(struct ubi_device *ubi, struct ubi_volume *vol, int lnum,
}
vid_hdr->vol_type = UBI_VID_DYNAMIC;
- vid_hdr->sqnum = cpu_to_be64(next_sqnum(ubi));
+ vid_hdr->sqnum = cpu_to_be64(ubi_next_sqnum(ubi));
vid_hdr->vol_id = cpu_to_be32(vol_id);
vid_hdr->lnum = cpu_to_be32(lnum);
vid_hdr->compat = ubi_get_compat(ubi, vol_id);
@@ -695,7 +695,7 @@ write_error:
return err;
}
- vid_hdr->sqnum = cpu_to_be64(next_sqnum(ubi));
+ vid_hdr->sqnum = cpu_to_be64(ubi_next_sqnum(ubi));
ubi_msg("try another PEB");
goto retry;
}
@@ -750,7 +750,7 @@ int ubi_eba_write_leb_st(struct ubi_device *ubi, struct ubi_volume *vol,
return err;
}
- vid_hdr->sqnum = cpu_to_be64(next_sqnum(ubi));
+ vid_hdr->sqnum = cpu_to_be64(ubi_next_sqnum(ubi));
vid_hdr->vol_id = cpu_to_be32(vol_id);
vid_hdr->lnum = cpu_to_be32(lnum);
vid_hdr->compat = ubi_get_compat(ubi, vol_id);
@@ -815,7 +815,7 @@ write_error:
return err;
}
- vid_hdr->sqnum = cpu_to_be64(next_sqnum(ubi));
+ vid_hdr->sqnum = cpu_to_be64(ubi_next_sqnum(ubi));
ubi_msg("try another PEB");
goto retry;
}
@@ -868,7 +868,7 @@ int ubi_eba_atomic_leb_change(struct ubi_device *ubi, struct ubi_volume *vol,
if (err)
goto out_mutex;
- vid_hdr->sqnum = cpu_to_be64(next_sqnum(ubi));
+ vid_hdr->sqnum = cpu_to_be64(ubi_next_sqnum(ubi));
vid_hdr->vol_id = cpu_to_be32(vol_id);
vid_hdr->lnum = cpu_to_be32(lnum);
vid_hdr->compat = ubi_get_compat(ubi, vol_id);
@@ -936,7 +936,7 @@ write_error:
goto out_leb_unlock;
}
- vid_hdr->sqnum = cpu_to_be64(next_sqnum(ubi));
+ vid_hdr->sqnum = cpu_to_be64(ubi_next_sqnum(ubi));
ubi_msg("try another PEB");
goto retry;
}
@@ -1096,7 +1096,7 @@ int ubi_eba_copy_leb(struct ubi_device *ubi, int from, int to,
vid_hdr->data_size = cpu_to_be32(data_size);
vid_hdr->data_crc = cpu_to_be32(crc);
}
- vid_hdr->sqnum = cpu_to_be64(next_sqnum(ubi));
+ vid_hdr->sqnum = cpu_to_be64(ubi_next_sqnum(ubi));
err = ubi_io_write_vid_hdr(ubi, to, vid_hdr);
if (err) {
diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
index 83fd308..8d2efb9 100644
--- a/drivers/mtd/ubi/ubi.h
+++ b/drivers/mtd/ubi/ubi.h
@@ -575,6 +575,7 @@ int ubi_eba_atomic_leb_change(struct ubi_device *ubi, struct ubi_volume *vol,
int ubi_eba_copy_leb(struct ubi_device *ubi, int from, int to,
struct ubi_vid_hdr *vid_hdr);
int ubi_eba_init_scan(struct ubi_device *ubi, struct ubi_scan_info *si);
+unsigned long long ubi_next_sqnum(struct ubi_device *ubi);
/* wl.c */
int ubi_wl_get_peb(struct ubi_device *ubi, int dtype);
--
1.7.6.5
The checkpointing subsystem need this funtion,
rename it to ubi_compare_lebs() and export it.
Signed-off-by: Richard Weinberger <[email protected]>
---
drivers/mtd/ubi/scan.c | 8 ++++----
drivers/mtd/ubi/ubi.h | 4 ++++
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/mtd/ubi/scan.c b/drivers/mtd/ubi/scan.c
index 12c43b4..5d4c1d3 100644
--- a/drivers/mtd/ubi/scan.c
+++ b/drivers/mtd/ubi/scan.c
@@ -295,7 +295,7 @@ static struct ubi_scan_volume *add_volume(struct ubi_scan_info *si, int vol_id,
}
/**
- * compare_lebs - find out which logical eraseblock is newer.
+ * ubi_compare_lebs - find out which logical eraseblock is newer.
* @ubi: UBI device description object
* @seb: first logical eraseblock to compare
* @pnum: physical eraseblock number of the second logical eraseblock to
@@ -314,7 +314,7 @@ static struct ubi_scan_volume *add_volume(struct ubi_scan_info *si, int vol_id,
* o bit 2 is cleared: the older LEB is not corrupted;
* o bit 2 is set: the older LEB is corrupted.
*/
-static int compare_lebs(struct ubi_device *ubi, const struct ubi_scan_leb *seb,
+int ubi_compare_lebs(struct ubi_device *ubi, const struct ubi_scan_leb *seb,
int pnum, const struct ubi_vid_hdr *vid_hdr)
{
void *buf;
@@ -503,7 +503,7 @@ int ubi_scan_add_used(struct ubi_device *ubi, struct ubi_scan_info *si,
* sequence numbers. We still can attach these images, unless
* there is a need to distinguish between old and new
* eraseblocks, in which case we'll refuse the image in
- * 'compare_lebs()'. In other words, we attach old clean
+ * 'ubi_compare_lebs()'. In other words, we attach old clean
* images, but refuse attaching old images with duplicated
* logical eraseblocks because there was an unclean reboot.
*/
@@ -519,7 +519,7 @@ int ubi_scan_add_used(struct ubi_device *ubi, struct ubi_scan_info *si,
* Now we have to drop the older one and preserve the newer
* one.
*/
- cmp_res = compare_lebs(ubi, seb, pnum, vid_hdr);
+ cmp_res = ubi_compare_lebs(ubi, seb, pnum, vid_hdr);
if (cmp_res < 0)
return cmp_res;
diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
index 8d2efb9..c3d8e75 100644
--- a/drivers/mtd/ubi/ubi.h
+++ b/drivers/mtd/ubi/ubi.h
@@ -621,6 +621,10 @@ void ubi_do_get_device_info(struct ubi_device *ubi, struct ubi_device_info *di);
void ubi_do_get_volume_info(struct ubi_device *ubi, struct ubi_volume *vol,
struct ubi_volume_info *vi);
+/* scan.c */
+int ubi_compare_lebs(struct ubi_device *ubi, const struct ubi_scan_leb *seb,
+ int pnum, const struct ubi_vid_hdr *vid_hdr);
+
/*
* ubi_rb_for_each_entry - walk an RB-tree.
* @rb: a pointer to type 'struct rb_node' to use as a loop counter
--
1.7.6.5
Specify the on-chip checkpoint layout.
The checkpoint consists of two major parts.
A super block (identified via UBI_CP_SB_VOLUME_ID) and
zero or more data blocks (identified via UBI_CP_DATA_VOLUME_ID).
Data blocks are only used if whole checkpoint information does not fit
into the super block.
All three checkpointing pools have the same size for now, this my also change.
Signed-off-by: Richard Weinberger <[email protected]>
---
drivers/mtd/ubi/ubi-media.h | 135 +++++++++++++++++++++++++++++++++++++++++++
1 files changed, 135 insertions(+), 0 deletions(-)
diff --git a/drivers/mtd/ubi/ubi-media.h b/drivers/mtd/ubi/ubi-media.h
index 6fb8ec2..7223b02 100644
--- a/drivers/mtd/ubi/ubi-media.h
+++ b/drivers/mtd/ubi/ubi-media.h
@@ -375,4 +375,139 @@ struct ubi_vtbl_record {
__be32 crc;
} __packed;
+#ifdef CONFIG_MTD_UBI_CHECKPOINT
+#define UBI_CP_SB_VOLUME_ID (UBI_LAYOUT_VOLUME_ID + 1)
+#define UBI_CP_DATA_VOLUME_ID (UBI_CP_SB_VOLUME_ID + 1)
+
+/* Checkoint format version */
+#define UBI_CP_FMT_VERSION 1
+
+#define UBI_CP_MAX_START 64
+#define UBI_CP_MAX_BLOCKS 32
+#define UBI_CP_MAX_POOL_SIZE 128
+#define UBI_CP_SB_MAGIC 0x7B11D69F
+#define UBI_CP_HDR_MAGIC 0xD4B82EF7
+#define UBI_CP_VHDR_MAGIC 0xFA370ED1
+#define UBI_CP_LPOOL_MAGIC 0x67AF4D08
+#define UBI_CP_SPOOL_MAGIC 0x67AF4D09
+#define UBI_CP_UPOOL_MAGIC 0x67AF4D0A
+
+/**
+ * struct ubi_cp_sb - UBI checkpoint super block
+ * @magic: checkpoint super block magic number (%UBI_CP_SB_MAGIC)
+ * @version: format version of this checkpoint
+ * @data_crc: CRC over the checkpoint data
+ * @nblocks: number of PEBs used by this checkpoint
+ * @block_loc: an array containing the location of all PEBs of the checkpoint
+ * @block_ec: the erase counter of each used PEB
+ * @sqnum: highest sequence number value at the time while taking the checkpoint
+ *
+ * The checkpoint
+ */
+struct ubi_cp_sb {
+ __be32 magic;
+ __u8 version;
+ __be32 data_crc;
+ __be32 nblocks;
+ __be32 block_loc[UBI_CP_MAX_BLOCKS];
+ __be32 block_ec[UBI_CP_MAX_BLOCKS];
+ __be64 sqnum;
+} __packed;
+
+/**
+ * struct ubi_cp_hdr - header of the checkpoint data set
+ * @magic: checkpoint header magic number (%UBI_CP_HDR_MAGIC)
+ * @nfree: number of free PEBs known by this checkpoint
+ * @nused: number of used PEBs known by this checkpoint
+ * @nvol: number of UBI volumes known by this checkpoint
+ */
+struct ubi_cp_hdr {
+ __be32 magic;
+ __be32 nfree;
+ __be32 nused;
+ __be32 nvol;
+} __packed;
+
+/* struct ubi_cp_hdr is followed by exactly three struct ub_cp_pool_* records
+ * long, short and unknown pool */
+
+/**
+ * struct ubi_cp_long_pool - Checkpoint pool with long term used PEBs
+ * @magic: long pool magic numer (%UBI_CP_LPOOL_MAGIC)
+ * @size: current pool size
+ * @pebs: an array containing the location of all PEBs in this pool
+ */
+struct ubi_cp_long_pool {
+ __be32 magic;
+ __be32 size;
+ __be32 pebs[UBI_CP_MAX_POOL_SIZE];
+} __packed;
+
+/**
+ * struct ubi_cp_short_pool - Checkpoint pool with short term used PEBs
+ * @magic: long pool magic numer (%UBI_CP_SPOOL_MAGIC)
+ * @size: current pool size
+ * @pebs: an array containing the location of all PEBs in this pool
+ */
+struct ubi_cp_short_pool {
+ __be32 magic;
+ __be32 size;
+ __be32 pebs[UBI_CP_MAX_POOL_SIZE];
+} __packed;
+
+/**
+ * struct ubi_cp_unk_pool - Checkpoint pool with all other PEBs
+ * @magic: long pool magic numer (%UBI_CP_UPOOL_MAGIC)
+ * @size: current pool size
+ * @pebs: an array containing the location of all PEBs in this pool
+ */
+struct ubi_cp_unk_pool {
+ __be32 magic;
+ __be32 size;
+ __be32 pebs[UBI_CP_MAX_POOL_SIZE];
+} __packed;
+
+/* struct ubi_cp_unk_pool is followed by nfree+nused struct ubi_cp_ec records */
+
+/**
+ * struct ubi_cp_ec - stores the erase counter of a PEB
+ * @pnum: PEB number
+ * @ec: ec of this PEB
+ */
+struct ubi_cp_ec {
+ __be32 pnum;
+ __be32 ec;
+} __packed;
+
+/**
+ * struct ubi_cp_volhdr - checkpoint volume header
+ * it identifies the start of an eba table
+ * @magic: checkpoint volume header magic number (%UBI_CP_VHDR_MAGIC)
+ * @vol_id: volume id of the checkpointed volume
+ * @vol_type: type of the checkpointed volume
+ * @data_pad: data_pad value of the checkpointed volume
+ * @used_ebs: number of used LEBs within this volume
+ * @last_eb_bytes: number of bytes used in the last LEB
+ */
+struct ubi_cp_volhdr {
+ __be32 magic;
+ __be32 vol_id;
+ __u8 vol_type;
+ __be32 data_pad;
+ __be32 used_ebs;
+ __be32 last_eb_bytes;
+} __packed;
+
+/* struct ubi_cp_volhdr is followed by nused struct ubi_cp_eba records */
+
+/**
+ * struct ubi_cp_eba - denotes an association beween a PEB and LEB
+ * @lnum: LEB number
+ * @pnum: PEB number
+ */
+struct ubi_cp_eba {
+ __be32 lnum;
+ __be32 pnum;
+} __packed;
+#endif /* CONFIG_MTD_UBI_CHECKPOINT */
#endif /* !__UBI_MEDIA_H__ */
--
1.7.6.5
Integrates checkpointing into the wl subsystem.
Checkpointing deals with PEBs, it has to tell the wl subsystem
which PEBs are currently used and must not touched by the wl thread.
Signed-off-by: Richard Weinberger <[email protected]>
---
drivers/mtd/ubi/ubi.h | 5 +
drivers/mtd/ubi/wl.c | 214 ++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 217 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
index c3d8e75..df267bb 100644
--- a/drivers/mtd/ubi/ubi.h
+++ b/drivers/mtd/ubi/ubi.h
@@ -585,6 +585,11 @@ int ubi_wl_scrub_peb(struct ubi_device *ubi, int pnum);
int ubi_wl_init_scan(struct ubi_device *ubi, struct ubi_scan_info *si);
void ubi_wl_close(struct ubi_device *ubi);
int ubi_thread(void *u);
+int ubi_wl_get_cp_peb(struct ubi_device *ubi, int max_pnum);
+int ubi_wl_put_cp_peb(struct ubi_device *ubi, int pnum, int torture);
+#define is_cp_block(__ubi__, __e__) __is_cp_block(__ubi__, __e__->pnum)
+int __is_cp_block(struct ubi_device *ubi, int pnum);
+void ubi_flush_prot_queue(struct ubi_device *ubi);
/* io.c */
int ubi_io_read(const struct ubi_device *ubi, void *buf, int pnum, int offset,
diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
index 7c1a9bf..b2e563e 100644
--- a/drivers/mtd/ubi/wl.c
+++ b/drivers/mtd/ubi/wl.c
@@ -175,6 +175,32 @@ static int paranoid_check_in_pq(const struct ubi_device *ubi,
#define paranoid_check_in_pq(ubi, e) 0
#endif
+#ifdef CONFIG_MTD_UBI_CHECKPOINT
+/**
+ * __is_cp_block - returns 1 if a PEB is currently used for checkpointing.
+ * @ubi: UBI device description object
+ * @pnum: the to be checked PEB
+ */
+int __is_cp_block(struct ubi_device *ubi, int pnum)
+{
+ int i;
+
+ if (!ubi->cp)
+ return 0;
+
+ for (i = 0; i < ubi->cp->used_blocks; i++)
+ if (ubi->cp->peb[i] == pnum)
+ return 1;
+
+ return 0;
+}
+#else
+int __is_cp_block(struct ubi_device *ubi, int pnum)
+{
+ return 0;
+}
+#endif
+
/**
* wl_tree_add - add a wear-leveling entry to a WL RB-tree.
* @e: the wear-leveling entry to add
@@ -380,15 +406,74 @@ static struct ubi_wl_entry *find_wl_entry(struct rb_root *root, int diff)
return e;
}
+#ifdef CONFIG_MTD_UBI_CHECKPOINT
+/**
+ * find_early_wl_entry - find wear-leveling entry with a low pnum.
+ * @root: the RB-tree where to look for
+ * @max_pnum: highest possible pnum
+ *
+ * This function looks for a wear leveling entry containing a eb which
+ * is in front of the memory.
+ */
+static struct ubi_wl_entry *find_early_wl_entry(struct rb_root *root,
+ int max_pnum)
+{
+ struct rb_node *p;
+ struct ubi_wl_entry *e, *victim = NULL;
+
+ ubi_rb_for_each_entry(p, e, root, u.rb) {
+ if (e->pnum < max_pnum) {
+ victim = e;
+ max_pnum = e->pnum;
+ }
+ }
+
+ return victim;
+}
+
/**
- * ubi_wl_get_peb - get a physical eraseblock.
+ * ubi_wl_get_cp_peb - find a physical erase block with a given maximal number.
+ * @ubi: UBI device description object
+ * @max_pnum: the highest acceptable erase block number
+ *
+ * The function returns a physical erase block with a given maximal number
+ * and removes it from the wl subsystem.
+ * Must be called with wl_lock held!
+ */
+int ubi_wl_get_cp_peb(struct ubi_device *ubi, int max_pnum)
+{
+ int ret = -ENOSPC;
+ struct ubi_wl_entry *e;
+
+ if (!ubi->free.rb_node) {
+ ubi_err("no free eraseblocks");
+
+ goto out;
+ }
+
+ e = find_early_wl_entry(&ubi->free, max_pnum);
+ if (!e)
+ goto out;
+
+ ret = e->pnum;
+
+ /* remove it from the free list,
+ * the wl subsystem does no longer know this erase block */
+ rb_erase(&e->u.rb, &ubi->free);
+out:
+ return ret;
+}
+#endif
+
+/**
+ * __ubi_wl_get_peb - get a physical eraseblock.
* @ubi: UBI device description object
* @dtype: type of data which will be stored in this physical eraseblock
*
* This function returns a physical eraseblock in case of success and a
* negative error code in case of failure. Might sleep.
*/
-int ubi_wl_get_peb(struct ubi_device *ubi, int dtype)
+static int __ubi_wl_get_peb(struct ubi_device *ubi, int dtype)
{
int err;
struct ubi_wl_entry *e, *first, *last;
@@ -472,6 +557,50 @@ retry:
return e->pnum;
}
+#ifdef CONFIG_MTD_UBI_CHECKPOINT
+/* ubi_wl_get_peb - works exaclty like __ubi_wl_get_peb but keeps track of
+ * all checkpointing pools.
+ */
+int ubi_wl_get_peb(struct ubi_device *ubi, int dtype)
+{
+ struct ubi_cp_pool *pool;
+
+ if (dtype == UBI_LONGTERM)
+ pool = &ubi->long_pool;
+ else if (dtype == UBI_SHORTTERM)
+ pool = &ubi->short_pool;
+ else if (dtype == UBI_UNKNOWN)
+ pool = &ubi->unk_pool;
+ else
+ BUG();
+
+ /* pool contains no free blocks, create a new one
+ * and write a checkpoint */
+ if (pool->used == pool->size) {
+ for (pool->size = 0; pool->size < pool->max_size;
+ pool->size++) {
+ pool->pebs[pool->size] = __ubi_wl_get_peb(ubi, dtype);
+ if (pool->pebs[pool->size] < 0)
+ break;
+ }
+
+ pool->used = 0;
+ ubi_update_checkpoint(ubi);
+ }
+
+ /* we got not a single free PEB */
+ if (!pool->size)
+ return -1;
+
+ return pool->pebs[pool->used++];
+}
+#else
+int ubi_wl_get_peb(struct ubi_device *ubi, int dtype)
+{
+ return __ubi_wl_get_peb(ubi, dtype);
+}
+#endif
+
/**
* prot_queue_del - remove a physical eraseblock from the protection queue.
* @ubi: UBI device description object
@@ -602,6 +731,21 @@ repeat:
}
/**
+ * ubi_flush_prot_queue - flushes the protection queue
+ * @ubi: UBI device description object
+ *
+ * This function flushes the protection queue such that checkpointing
+ * gets aware of them.
+ */
+void ubi_flush_prot_queue(struct ubi_device *ubi)
+{
+ int i;
+
+ for (i = 0; i < UBI_PROT_QUEUE_LEN; i++)
+ serve_prot_queue(ubi);
+}
+
+/**
* schedule_ubi_work - schedule a work.
* @ubi: UBI device description object
* @wrk: the work to schedule
@@ -637,6 +781,9 @@ static int schedule_erase(struct ubi_device *ubi, struct ubi_wl_entry *e,
{
struct ubi_work *wl_wrk;
+ ubi_assert(e);
+ ubi_assert(!is_cp_block(ubi, e));
+
dbg_wl("schedule erasure of PEB %d, EC %d, torture %d",
e->pnum, e->ec, torture);
@@ -652,6 +799,57 @@ static int schedule_erase(struct ubi_device *ubi, struct ubi_wl_entry *e,
return 0;
}
+#ifdef CONFIG_MTD_UBI_CHECKPOINT
+/**
+ * ubi_wl_put_cp_peb - return a CP PEB to the wear-leveling sub-system
+ *
+ * see: ubi_wl_put_peb()
+ */
+int ubi_wl_put_cp_peb(struct ubi_device *ubi, int pnum, int torture)
+{
+ int i, err = 0;
+ struct ubi_wl_entry *e;
+
+ dbg_wl("PEB %d", pnum);
+ ubi_assert(pnum >= 0);
+ ubi_assert(pnum < ubi->peb_count);
+
+ spin_lock(&ubi->wl_lock);
+ e = ubi->lookuptbl[pnum];
+
+ /* This can happen if we recovered from a checkpoint the very
+ * frist time and writing now a new one. In this case the wl system
+ * has never seen any PEB used by the original checkpoint.
+ */
+ if (!e) {
+ ubi_assert(ubi->old_cp);
+ e = kmem_cache_alloc(ubi_wl_entry_slab, GFP_ATOMIC);
+ if (!e) {
+ spin_unlock(&ubi->wl_lock);
+ return -ENOMEM;
+ }
+
+ e->pnum = pnum;
+ e->ec = 0;
+ /* use the ec value from the checkpoint */
+ for (i = 0; i < UBI_CP_MAX_BLOCKS; i++) {
+ if (pnum == ubi->old_cp->peb[i]) {
+ e->ec = ubi->old_cp->ec[i];
+ break;
+ }
+ }
+ ubi_assert(e->ec);
+ ubi->lookuptbl[pnum] = e;
+ }
+
+ spin_unlock(&ubi->wl_lock);
+
+ err = schedule_erase(ubi, e, torture);
+
+ return err;
+}
+#endif
+
/**
* wear_leveling_worker - wear-leveling worker function.
* @ubi: UBI device description object
@@ -1029,6 +1227,8 @@ static int erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk,
dbg_wl("erase PEB %d EC %d", pnum, e->ec);
+ ubi_assert(!is_cp_block(ubi, e));
+
err = sync_erase(ubi, e, wl_wrk->torture);
if (!err) {
/* Fine, we've erased it successfully */
@@ -1463,6 +1663,9 @@ int ubi_wl_init_scan(struct ubi_device *ubi, struct ubi_scan_info *si)
e->pnum = seb->pnum;
e->ec = seb->ec;
+
+ ubi_assert(!is_cp_block(ubi, e));
+
ubi->lookuptbl[e->pnum] = e;
if (schedule_erase(ubi, e, 0)) {
kmem_cache_free(ubi_wl_entry_slab, e);
@@ -1480,7 +1683,10 @@ int ubi_wl_init_scan(struct ubi_device *ubi, struct ubi_scan_info *si)
e->pnum = seb->pnum;
e->ec = seb->ec;
ubi_assert(e->ec >= 0);
+ ubi_assert(!is_cp_block(ubi, e));
+
wl_tree_add(e, &ubi->free);
+
ubi->lookuptbl[e->pnum] = e;
}
@@ -1495,6 +1701,10 @@ int ubi_wl_init_scan(struct ubi_device *ubi, struct ubi_scan_info *si)
e->pnum = seb->pnum;
e->ec = seb->ec;
ubi->lookuptbl[e->pnum] = e;
+
+ if (__is_cp_block(ubi, seb->pnum))
+ continue;
+
if (!seb->scrub) {
dbg_wl("add PEB %d EC %d to the used tree",
e->pnum, e->ec);
--
1.7.6.5
Implements UBI checkpointing support.
It reduces the attaching time from O(N) to O(1).
Checkpoints are written on demand and upon changes of the volume layout.
If the recovery from a checkpoint fails we fall back to scanning mode.
Signed-off-by: Richard Weinberger <[email protected]>
---
drivers/mtd/ubi/Kconfig | 8 +
drivers/mtd/ubi/Makefile | 1 +
drivers/mtd/ubi/checkpoint.c | 1128 ++++++++++++++++++++++++++++++++++++++++++
drivers/mtd/ubi/scan.c | 10 +-
drivers/mtd/ubi/ubi.h | 10 +-
5 files changed, 1155 insertions(+), 2 deletions(-)
create mode 100644 drivers/mtd/ubi/checkpoint.c
diff --git a/drivers/mtd/ubi/Kconfig b/drivers/mtd/ubi/Kconfig
index 4dcc752..3ba9978 100644
--- a/drivers/mtd/ubi/Kconfig
+++ b/drivers/mtd/ubi/Kconfig
@@ -51,6 +51,14 @@ config MTD_UBI_GLUEBI
volume. This is handy to make MTD-oriented software (like JFFS2)
work on top of UBI. Do not enable this unless you use legacy
software.
+config MTD_UBI_CHECKPOINT
+ bool "UBIVIS (EXPERIMENTAL)"
+ depends on EXPERIMENTAL
+ default n
+ help
+ This option enables UBIVIS (AKA checkpointing).
+ It allows attaching UBI devices without scanning the whole MTD
+ device. Instead it extracts all needed information from a checkpoint.
config MTD_UBI_DEBUG
bool "UBI debugging"
diff --git a/drivers/mtd/ubi/Makefile b/drivers/mtd/ubi/Makefile
index c9302a5..845312a 100644
--- a/drivers/mtd/ubi/Makefile
+++ b/drivers/mtd/ubi/Makefile
@@ -3,5 +3,6 @@ obj-$(CONFIG_MTD_UBI) += ubi.o
ubi-y += vtbl.o vmt.o upd.o build.o cdev.o kapi.o eba.o io.o wl.o scan.o
ubi-y += misc.o
+ubi-$(CONFIG_MTD_UBI_CHECKPOINT) += checkpoint.o
ubi-$(CONFIG_MTD_UBI_DEBUG) += debug.o
obj-$(CONFIG_MTD_UBI_GLUEBI) += gluebi.o
diff --git a/drivers/mtd/ubi/checkpoint.c b/drivers/mtd/ubi/checkpoint.c
new file mode 100644
index 0000000..f43441c
--- /dev/null
+++ b/drivers/mtd/ubi/checkpoint.c
@@ -0,0 +1,1128 @@
+/*
+ * Copyright (c) 2012 Linutronix GmbH
+ * Author: Richard Weinberger <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
+ * the GNU General Public License for more details.
+ *
+ */
+
+#include <linux/crc32.h>
+#include "ubi.h"
+
+/**
+ * new_cp_vhdr - allocate a new volume header for checkpoint usage.
+ * @ubi: UBI device description object
+ * @vol_id: the VID of the new header
+ */
+static struct ubi_vid_hdr *new_cp_vhdr(struct ubi_device *ubi, int vol_id)
+{
+ struct ubi_vid_hdr *new;
+
+ new = ubi_zalloc_vid_hdr(ubi, GFP_KERNEL);
+ if (!new)
+ goto out;
+
+ new->vol_type = UBI_VID_DYNAMIC;
+ new->vol_id = cpu_to_be32(vol_id);
+
+ /* the checkpoint has be deleted on older kernels */
+ new->compat = UBI_COMPAT_DELETE;
+
+out:
+ return new;
+}
+
+/**
+ * add_seb - create and add a scan erase block to a given list.
+ * @si: UBI scan info object
+ * @list: the target list
+ * @pnum: PEB number of the new scan erase block
+ * @ec: erease counter of the new SEB
+ */
+static int add_seb(struct ubi_scan_info *si, struct list_head *list,
+ int pnum, int ec)
+{
+ struct ubi_scan_leb *seb;
+
+ seb = kmem_cache_alloc(si->scan_leb_slab, GFP_KERNEL);
+ if (!seb)
+ return -ENOMEM;
+
+ seb->pnum = pnum;
+ seb->ec = ec;
+ seb->lnum = -1;
+ seb->scrub = seb->copy_flag = seb->sqnum = 0;
+
+ si->ec_sum += seb->ec;
+ si->ec_count++;
+
+ if (si->max_ec < seb->ec)
+ si->max_ec = seb->ec;
+
+ if (si->min_ec > seb->ec)
+ si->min_ec = seb->ec;
+
+ list_add_tail(&seb->u.list, list);
+
+ return 0;
+}
+
+/**
+ * add_vol - create and add a new scan volume to ubi_scan_info.
+ * @si: ubi_scan_info object
+ * @vol_id: VID of the new volume
+ * @used_ebs: number of used EBS
+ * @data_pad: data padding value of the new volume
+ * @vol_type: volume type
+ * @last_eb_bytes: number of bytes in the last LEB
+ */
+static struct ubi_scan_volume *add_vol(struct ubi_scan_info *si, int vol_id,
+ int used_ebs, int data_pad, u8 vol_type,
+ int last_eb_bytes)
+{
+ struct ubi_scan_volume *sv;
+ struct rb_node **p = &si->volumes.rb_node, *parent = NULL;
+
+ while (*p) {
+ parent = *p;
+ sv = rb_entry(parent, struct ubi_scan_volume, rb);
+
+ if (vol_id > sv->vol_id)
+ p = &(*p)->rb_left;
+ else if (vol_id > sv->vol_id)
+ p = &(*p)->rb_right;
+ }
+
+ sv = kmalloc(sizeof(struct ubi_scan_volume), GFP_KERNEL);
+ if (!sv)
+ goto out;
+
+ sv->highest_lnum = sv->leb_count = 0;
+ sv->vol_id = vol_id;
+ sv->used_ebs = used_ebs;
+ sv->data_pad = data_pad;
+ sv->last_data_size = last_eb_bytes;
+ sv->compat = 0;
+ sv->vol_type = vol_type;
+ sv->root = RB_ROOT;
+
+ rb_link_node(&sv->rb, parent, p);
+ rb_insert_color(&sv->rb, &si->volumes);
+
+out:
+ return sv;
+}
+
+/**
+ * assign_seb_to_sv - assigns a SEB to a given scan_volume and removes it
+ * from it's original list.
+ * @si: ubi_scan_info object
+ * @seb: the to be assigned SEB
+ * @sv: target scan volume
+ */
+static void assign_seb_to_sv(struct ubi_scan_info *si,
+ struct ubi_scan_leb *seb,
+ struct ubi_scan_volume *sv)
+{
+ struct ubi_scan_leb *tmp_seb;
+ struct rb_node **p = &si->volumes.rb_node, *parent = NULL;
+
+ p = &sv->root.rb_node;
+ while (*p) {
+ parent = *p;
+
+ tmp_seb = rb_entry(parent, struct ubi_scan_leb, u.rb);
+ if (seb->lnum != tmp_seb->lnum) {
+ if (seb->lnum < tmp_seb->lnum)
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
+
+ continue;
+ } else
+ break;
+ }
+
+ list_del(&seb->u.list);
+ sv->leb_count++;
+
+ rb_link_node(&seb->u.rb, parent, p);
+ rb_insert_color(&seb->u.rb, &sv->root);
+}
+
+/**
+ * update_vol - inserts or updates a LEB which was found a pool.
+ * @ubi: the UBI device object
+ * @si: scan info object
+ * @sv: the scan volume where this LEB belongs to
+ * @new_vh: the volume header derived from new_seb
+ * @new_seb: the SEB to be examined
+ */
+static int update_vol(struct ubi_device *ubi, struct ubi_scan_info *si,
+ struct ubi_scan_volume *sv, struct ubi_vid_hdr *new_vh,
+ struct ubi_scan_leb *new_seb)
+{
+ struct rb_node **p = &sv->root.rb_node, *parent = NULL;
+ struct ubi_scan_leb *seb, *victim;
+ int cmp_res;
+
+ while (*p) {
+ parent = *p;
+ seb = rb_entry(parent, struct ubi_scan_leb, u.rb);
+
+ if (be32_to_cpu(new_vh->lnum) != seb->lnum) {
+ if (be32_to_cpu(new_vh->lnum) < seb->lnum)
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
+
+ continue;
+ }
+
+ /* A nasty corner case:
+ *
+ * As we have three checkpoint pools (short, long and
+ * unknown term) it can happen that a PEB is checkpointed
+ * (in the EBA table of the checkpoint) and sits in one of the
+ * thee pools. E.g. PEB P get's requests from WL subsystem for
+ * short term usage, P goes into the short term checkpoint pool
+ * and UBI assigns a LEB L to P. Therefore P is also known in
+ * the EBA table.
+ * If the long term or unknown pool is full a new checkpoint
+ * is written.
+ * --> P is in the short term pool and the EBA.
+ * While reading the checkpoint we see P twice.
+ *
+ * If we had only one pool this must not happen.
+ */
+ if (seb->pnum == new_seb->pnum) {
+ kmem_cache_free(si->scan_leb_slab, new_seb);
+
+ return 0;
+ }
+
+ cmp_res = ubi_compare_lebs(ubi, seb, new_seb->pnum, new_vh);
+ if (cmp_res < 0)
+ return cmp_res;
+
+ /* new_seb is newer */
+ if (cmp_res & 1) {
+ victim = kmem_cache_alloc(si->scan_leb_slab,
+ GFP_KERNEL);
+ if (!victim)
+ return -ENOMEM;
+
+ victim->ec = seb->ec;
+ victim->pnum = seb->pnum;
+ list_add_tail(&victim->u.list, &si->erase);
+
+ seb->ec = new_seb->ec;
+ seb->pnum = new_seb->pnum;
+ seb->copy_flag = new_vh->copy_flag;
+ kmem_cache_free(si->scan_leb_slab, new_seb);
+
+ /* new_seb is older */
+ } else {
+ ubi_msg("Vol %i: LEB %i's PEB %i is old, dropping it\n",
+ sv->vol_id, seb->lnum, new_seb->pnum);
+ list_add_tail(&new_seb->u.list, &si->erase);
+ }
+
+ return 0;
+ }
+
+ /* This LEB is new, let's add it to the volume */
+ dbg_bld("Vol %i (type = %i): SEB %i is new, adding it!\n", sv->vol_type,
+ sv->vol_id, new_seb->lnum);
+
+ if (sv->vol_type == UBI_STATIC_VOLUME)
+ sv->used_ebs++;
+
+ sv->leb_count++;
+
+ rb_link_node(&new_seb->u.rb, parent, p);
+ rb_insert_color(&new_seb->u.rb, &sv->root);
+
+ return 0;
+}
+
+/**
+ * process_pool_seb - we found a non-empty PEB in a pool
+ * @ubi: UBI device object
+ * @si: scan info object
+ * @new_vh: the volume header derived from new_seb
+ * @new_seb: the SEB to be examined
+ */
+static int process_pool_seb(struct ubi_device *ubi, struct ubi_scan_info *si,
+ struct ubi_vid_hdr *new_vh,
+ struct ubi_scan_leb *new_seb)
+{
+ struct ubi_scan_volume *sv, *tmp_sv = NULL;
+ struct rb_node **p = &si->volumes.rb_node, *parent = NULL;
+ int found = 0;
+
+ if (be32_to_cpu(new_vh->vol_id) == UBI_CP_SB_VOLUME_ID ||
+ be32_to_cpu(new_vh->vol_id) == UBI_CP_DATA_VOLUME_ID) {
+ kmem_cache_free(si->scan_leb_slab, new_seb);
+
+ return 0;
+ }
+
+ /* Find the volume this SEB belongs to */
+ while (*p) {
+ parent = *p;
+ tmp_sv = rb_entry(parent, struct ubi_scan_volume, rb);
+
+ if (be32_to_cpu(new_vh->vol_id) > tmp_sv->vol_id)
+ p = &(*p)->rb_left;
+ else if (be32_to_cpu(new_vh->vol_id) < tmp_sv->vol_id)
+ p = &(*p)->rb_right;
+ else {
+ found = 1;
+ break;
+ }
+ }
+
+ if (found)
+ sv = tmp_sv;
+ else {
+ ubi_err("Orphaned volume in checkpoint pool!");
+
+ return -EINVAL;
+ }
+
+ ubi_assert(be32_to_cpu(new_vh->vol_id) == sv->vol_id);
+
+ return update_vol(ubi, si, sv, new_vh, new_seb);
+}
+
+/**
+ * scan_pool - scans a pool for changed (no longer empty PEBs)
+ * @ubi: UBI device object
+ * @si: scan info object
+ * @pebs: an array of all PEB numbers in the to be scanned pool
+ * @pool_size: size of the pool (number of entries in @pebs)
+ * @max_sqnum2: pointer to the maximal sequence number
+ */
+static int scan_pool(struct ubi_device *ubi, struct ubi_scan_info *si,
+ int *pebs, int pool_size, unsigned long long *max_sqnum2)
+{
+ struct ubi_vid_hdr *vh;
+ struct ubi_scan_leb *new_seb;
+ int i;
+ int pnum;
+ int err;
+
+ vh = ubi_zalloc_vid_hdr(ubi, GFP_KERNEL);
+ if (!vh)
+ return -ENOMEM;
+
+ /*
+ * Now scan all PEBs in the pool to find changes which have been made
+ * after the creation of the checkpoint
+ */
+ for (i = 0; i < pool_size; i++) {
+ pnum = be32_to_cpu(pebs[i]);
+ err = ubi_io_read_vid_hdr(ubi, pnum, vh, 0);
+
+ if (err == UBI_IO_FF)
+ continue;
+ else if (err == 0) {
+ dbg_bld("PEB %i is no longer free, scanning it!", pnum);
+
+ new_seb = kmem_cache_alloc(si->scan_leb_slab,
+ GFP_KERNEL);
+ if (!new_seb) {
+ ubi_free_vid_hdr(ubi, vh);
+
+ return -ENOMEM;
+ }
+
+ new_seb->ec = -1;
+ new_seb->pnum = pnum;
+ new_seb->lnum = be32_to_cpu(vh->lnum);
+ new_seb->sqnum = be64_to_cpu(vh->sqnum);
+ new_seb->copy_flag = vh->copy_flag;
+ new_seb->scrub = 0;
+
+ err = process_pool_seb(ubi, si, vh, new_seb);
+ if (err) {
+ ubi_free_vid_hdr(ubi, vh);
+ return err;
+ }
+
+ if (*max_sqnum2 < new_seb->sqnum)
+ *max_sqnum2 = new_seb->sqnum;
+ } else {
+ /* We are paranoid and fall back to scanning mode */
+ ubi_err("Checkpoint pool PEBs contains damaged PEBs!");
+ ubi_free_vid_hdr(ubi, vh);
+ return err;
+ }
+
+ }
+ ubi_free_vid_hdr(ubi, vh);
+
+ return 0;
+}
+
+/**
+ * ubi_scan_checkpoint - creates ubi_scan_info from a checkpoint.
+ * @ubi: UBI device object
+ * @cp_raw: the checkpoint it self al byte array
+ * @cp_size: size of the checkpoint in bytes
+ */
+struct ubi_scan_info *ubi_scan_checkpoint(struct ubi_device *ubi,
+ char *cp_raw,
+ size_t cp_size)
+{
+ struct list_head used;
+ struct ubi_scan_volume *sv;
+ struct ubi_scan_leb *seb, *tmp_seb, *_tmp_seb;
+ struct ubi_scan_info *si;
+ int i, j;
+
+ size_t cp_pos = 0;
+ struct ubi_cp_sb *cpsb;
+ struct ubi_cp_hdr *cphdr;
+ struct ubi_cp_long_pool *cplpl;
+ struct ubi_cp_short_pool *cpspl;
+ struct ubi_cp_unk_pool *cpupl;
+ struct ubi_cp_ec *cpec;
+ struct ubi_cp_volhdr *cpvhdr;
+ struct ubi_cp_eba *cp_eba;
+
+ unsigned long long max_sqnum2 = 0;
+
+ si = kzalloc(sizeof(struct ubi_scan_info), GFP_KERNEL);
+ if (!si)
+ return ERR_PTR(-ENOMEM);
+
+ INIT_LIST_HEAD(&used);
+ INIT_LIST_HEAD(&si->corr);
+ INIT_LIST_HEAD(&si->free);
+ INIT_LIST_HEAD(&si->erase);
+ INIT_LIST_HEAD(&si->alien);
+ si->volumes = RB_ROOT;
+ si->min_ec = UBI_MAX_ERASECOUNTER;
+
+ si->scan_leb_slab = kmem_cache_create("ubi_scan_leb_slab",
+ sizeof(struct ubi_scan_leb),
+ 0, 0, NULL);
+ if (!si->scan_leb_slab)
+ goto fail;
+
+ cpsb = (struct ubi_cp_sb *)(cp_raw);
+ si->max_sqnum = cpsb->sqnum;
+ cp_pos += sizeof(struct ubi_cp_sb);
+ if (cp_pos >= cp_size)
+ goto fail;
+
+ cphdr = (struct ubi_cp_hdr *)(cp_raw + cp_pos);
+ cp_pos += sizeof(*cphdr);
+
+ if (cphdr->magic != UBI_CP_HDR_MAGIC)
+ goto fail;
+
+ cplpl = (struct ubi_cp_long_pool *)(cp_raw + cp_pos);
+ cp_pos += sizeof(*cplpl);
+ if (cplpl->magic != UBI_CP_LPOOL_MAGIC)
+ goto fail;
+
+ cpspl = (struct ubi_cp_short_pool *)(cp_raw + cp_pos);
+ cp_pos += sizeof(*cpspl);
+ if (cpspl->magic != UBI_CP_SPOOL_MAGIC)
+ goto fail;
+
+ cpupl = (struct ubi_cp_unk_pool *)(cp_raw + cp_pos);
+ cp_pos += sizeof(*cpupl);
+ if (cpupl->magic != UBI_CP_UPOOL_MAGIC)
+ goto fail;
+
+ /* read EC values from free list */
+ for (i = 0; i < be32_to_cpu(cphdr->nfree); i++) {
+ cpec = (struct ubi_cp_ec *)(cp_raw + cp_pos);
+ cp_pos += sizeof(*cpec);
+ if (cp_pos >= cp_size)
+ goto fail;
+
+ add_seb(si, &si->free, be32_to_cpu(cpec->pnum),
+ be32_to_cpu(cpec->ec));
+ }
+
+ /* read EC values from used list */
+ for (i = 0; i < be32_to_cpu(cphdr->nused); i++) {
+ cpec = (struct ubi_cp_ec *)(cp_raw + cp_pos);
+ cp_pos += sizeof(*cpec);
+ if (cp_pos >= cp_size)
+ goto fail;
+
+ add_seb(si, &used, be32_to_cpu(cpec->pnum),
+ be32_to_cpu(cpec->ec));
+ }
+
+ si->mean_ec = div_u64(si->ec_sum, si->ec_count);
+
+ /* Iterate over all volumes and read their EBA table */
+ for (i = 0; i < be32_to_cpu(cphdr->nvol); i++) {
+ cpvhdr = (struct ubi_cp_volhdr *)(cp_raw + cp_pos);
+ cp_pos += sizeof(*cpvhdr);
+
+ if (cpvhdr->magic != UBI_CP_VHDR_MAGIC)
+ goto fail;
+
+ sv = add_vol(si, be32_to_cpu(cpvhdr->vol_id),
+ be32_to_cpu(cpvhdr->used_ebs),
+ be32_to_cpu(cpvhdr->data_pad),
+ cpvhdr->vol_type, be32_to_cpu(cpvhdr->last_eb_bytes));
+
+ if (!sv)
+ goto fail;
+
+ si->vols_found++;
+ if (si->highest_vol_id < be32_to_cpu(cpvhdr->vol_id))
+ si->highest_vol_id = be32_to_cpu(cpvhdr->vol_id);
+
+ for (j = 0; j < be32_to_cpu(cpvhdr->used_ebs); j++) {
+ cp_eba = (struct ubi_cp_eba *)(cp_raw + cp_pos);
+ cp_pos += sizeof(*cp_eba);
+ if (cp_pos >= cp_size)
+ goto fail;
+
+ if ((int)be32_to_cpu(cp_eba->pnum) < 0)
+ continue;
+
+ seb = NULL;
+ list_for_each_entry(tmp_seb, &used, u.list) {
+ if (tmp_seb->pnum == be32_to_cpu(cp_eba->pnum))
+ seb = tmp_seb;
+ }
+
+ /* Not good, a EBA entry points to a PEB which is not
+ * n our used list */
+ if (!seb)
+ goto fail;
+
+ seb->lnum = be32_to_cpu(cp_eba->lnum);
+ assign_seb_to_sv(si, seb, sv);
+
+ dbg_bld("Inserting pnum %i (leb %i) to vol %i",
+ seb->pnum, seb->lnum, sv->vol_id);
+ }
+ }
+
+ /*
+ * The remainning PEB in the used list are not used.
+ * They lived in the checkpoint pool but got never used.
+ */
+ list_for_each_entry_safe(tmp_seb, _tmp_seb, &used, u.list) {
+ list_del(&tmp_seb->u.list);
+ list_add_tail(&tmp_seb->u.list, &si->free);
+ }
+
+ if (scan_pool(ubi, si, cplpl->pebs, be32_to_cpu(cplpl->size),
+ &max_sqnum2) < 0)
+ goto fail;
+ if (scan_pool(ubi, si, cpspl->pebs, be32_to_cpu(cpspl->size),
+ &max_sqnum2) < 0)
+ goto fail;
+ if (scan_pool(ubi, si, cpupl->pebs, be32_to_cpu(cpupl->size),
+ &max_sqnum2) < 0)
+ goto fail;
+
+ if (max_sqnum2 > si->max_sqnum)
+ si->max_sqnum = max_sqnum2;
+
+ return si;
+
+fail:
+ ubi_scan_destroy_si(si);
+ return NULL;
+}
+
+/**
+ * ubi_read_checkpoint - read the checkpoint
+ * @ubi: UBI device object
+ * @cb_sb_pnum: PEB number of the checkpoint super block
+ */
+struct ubi_scan_info *ubi_read_checkpoint(struct ubi_device *ubi,
+ int cb_sb_pnum)
+{
+ struct ubi_cp_sb *cpsb;
+ struct ubi_vid_hdr *vh;
+ int ret, i, nblocks;
+ char *cp_raw;
+ size_t cp_size;
+ __be32 data_crc;
+ unsigned long long sqnum = 0;
+ struct ubi_scan_info *si = NULL;
+
+ cpsb = kmalloc(sizeof(*cpsb), GFP_KERNEL);
+ if (!cpsb) {
+ si = ERR_PTR(-ENOMEM);
+
+ goto out;
+ }
+
+ ret = ubi_io_read(ubi, cpsb, cb_sb_pnum, ubi->leb_start, sizeof(*cpsb));
+ if (ret) {
+ ubi_err("Unable to read checkpoint super block");
+ si = ERR_PTR(ret);
+ kfree(cpsb);
+
+ goto out;
+ }
+
+ if (cpsb->magic != UBI_CP_SB_MAGIC) {
+ ubi_err("Super block magic does not match");
+ si = ERR_PTR(-EINVAL);
+ kfree(cpsb);
+
+ goto out;
+ }
+
+ if (cpsb->version != UBI_CP_FMT_VERSION) {
+ ubi_err("Unknown checkpoint format version!");
+ si = ERR_PTR(-EINVAL);
+ kfree(cpsb);
+
+ goto out;
+ }
+
+ nblocks = be32_to_cpu(cpsb->nblocks);
+
+ if (nblocks > UBI_CP_MAX_BLOCKS || nblocks < 1) {
+ ubi_err("Number of checkpoint blocks is invalid");
+ si = ERR_PTR(-EINVAL);
+ kfree(cpsb);
+
+ goto out;
+ }
+
+ cp_size = ubi->leb_size * nblocks;
+ /* cp_raw will contain the whole checkpoint */
+ cp_raw = vzalloc(cp_size);
+ if (!cp_raw) {
+ si = ERR_PTR(-ENOMEM);
+ kfree(cpsb);
+ }
+
+ vh = ubi_zalloc_vid_hdr(ubi, GFP_KERNEL);
+ if (!vh) {
+ si = ERR_PTR(-ENOMEM);
+ kfree(cpsb);
+
+ goto free_raw;
+ }
+
+ for (i = 0; i < nblocks; i++) {
+ ret = ubi_io_read_vid_hdr(ubi, be32_to_cpu(cpsb->block_loc[i]),
+ vh, 0);
+ if (ret) {
+ ubi_err("Unable to read checkpoint block# %i (PEB: %i)",
+ i, be32_to_cpu(cpsb->block_loc[i]));
+ si = ERR_PTR(ret);
+
+ goto free_vhdr;
+ }
+
+ if (i == 0) {
+ if (be32_to_cpu(vh->vol_id) != UBI_CP_SB_VOLUME_ID) {
+ si = ERR_PTR(-EINVAL);
+
+ goto free_vhdr;
+ }
+ } else {
+ if (be32_to_cpu(vh->vol_id) != UBI_CP_DATA_VOLUME_ID) {
+ goto free_vhdr;
+
+ si = ERR_PTR(-EINVAL);
+ }
+ }
+
+ if (sqnum < be64_to_cpu(vh->sqnum))
+ sqnum = be64_to_cpu(vh->sqnum);
+
+ ret = ubi_io_read(ubi, cp_raw + (ubi->leb_size * i),
+ be32_to_cpu(cpsb->block_loc[i]),
+ ubi->leb_start, ubi->leb_size);
+
+ if (ret) {
+ ubi_err("Unable to read checkpoint block# %i (PEB: %i)",
+ i, be32_to_cpu(cpsb->block_loc[i]));
+ si = ERR_PTR(ret);
+
+ goto free_vhdr;
+ }
+ }
+
+ kfree(cpsb);
+
+ cpsb = (struct ubi_cp_sb *)cp_raw;
+ data_crc = crc32_be(UBI_CRC32_INIT, cp_raw + sizeof(*cpsb),
+ cp_size - sizeof(*cpsb));
+ if (data_crc != cpsb->data_crc) {
+ ubi_err("Checkpoint data CRC is invalid");
+ si = ERR_PTR(-EINVAL);
+
+ goto free_vhdr;
+ }
+
+ cpsb->sqnum = sqnum;
+
+ si = ubi_scan_checkpoint(ubi, cp_raw, cp_size);
+ if (!si) {
+ si = ERR_PTR(-EINVAL);
+
+ goto free_vhdr;
+ }
+
+ /* Store the checkpoint position into the ubi_device struct */
+ ubi->cp = kmalloc(sizeof(struct ubi_checkpoint), GFP_KERNEL);
+ if (!ubi->cp) {
+ si = ERR_PTR(-ENOMEM);
+ ubi_scan_destroy_si(si);
+
+ goto free_vhdr;
+ }
+
+ ubi->cp->size = cp_size;
+ ubi->cp->used_blocks = nblocks;
+
+ for (i = 0; i < UBI_CP_MAX_BLOCKS; i++) {
+ if (i < nblocks) {
+ ubi->cp->peb[i] = be32_to_cpu(cpsb->block_loc[i]);
+ ubi->cp->ec[i] = be32_to_cpu(cpsb->block_ec[i]);
+ } else {
+ ubi->cp->peb[i] = -1;
+ ubi->cp->ec[i] = 0;
+ }
+ }
+
+free_vhdr:
+ ubi_free_vid_hdr(ubi, vh);
+free_raw:
+ vfree(cp_raw);
+out:
+ return si;
+}
+
+/**
+ * ubi_find_checkpoint - searches the first UBI_CP_MAX_START PEBs for the
+ * checkpoint super block.
+ * @ubi: UBI device object
+ */
+int ubi_find_checkpoint(struct ubi_device *ubi)
+{
+ int i, ret;
+ int cp_sb = -ENOENT;
+ struct ubi_vid_hdr *vhdr;
+
+ vhdr = ubi_zalloc_vid_hdr(ubi, GFP_KERNEL);
+ if (!vhdr)
+ return -ENOMEM;
+
+ for (i = 0; i < UBI_CP_MAX_START; i++) {
+ ret = ubi_io_read_vid_hdr(ubi, i, vhdr, 0);
+ /* ignore read errors */
+ if (ret)
+ continue;
+
+ if (be32_to_cpu(vhdr->vol_id) == UBI_CP_SB_VOLUME_ID) {
+ cp_sb = i;
+ break;
+ }
+ }
+
+ ubi_free_vid_hdr(ubi, vhdr);
+ return cp_sb;
+}
+
+/**
+ * ubi_write_checkpoint - writes a checkpoint
+ * @ubi: UBI device object
+ * @new_cp: the to be written checkppoint
+ */
+static int ubi_write_checkpoint(struct ubi_device *ubi,
+ struct ubi_checkpoint *new_cp)
+{
+ int ret;
+ size_t cp_pos = 0;
+ char *cp_raw;
+ int i, j;
+
+ struct ubi_cp_sb *cpsb;
+ struct ubi_cp_hdr *cph;
+ struct ubi_cp_long_pool *cplpl;
+ struct ubi_cp_short_pool *cpspl;
+ struct ubi_cp_unk_pool *cpupl;
+ struct ubi_cp_ec *cec;
+ struct ubi_cp_volhdr *cvh;
+ struct ubi_cp_eba *ceba;
+
+ struct rb_node *node;
+ struct ubi_wl_entry *wl_e;
+ struct ubi_volume *vol;
+
+ struct ubi_vid_hdr *svhdr, *dvhdr;
+
+ int nfree, nused, nvol;
+
+ cp_raw = vzalloc(new_cp->size);
+ if (!cp_raw) {
+ ret = -ENOMEM;
+
+ goto out;
+ }
+
+ svhdr = new_cp_vhdr(ubi, UBI_CP_SB_VOLUME_ID);
+ if (!svhdr) {
+ ret = -ENOMEM;
+
+ goto out_vfree;
+ }
+
+ dvhdr = new_cp_vhdr(ubi, UBI_CP_DATA_VOLUME_ID);
+ if (!dvhdr) {
+ ret = -ENOMEM;
+
+ goto out_kfree;
+ }
+
+ ubi_flush_prot_queue(ubi);
+
+ spin_lock(&ubi->volumes_lock);
+ spin_lock(&ubi->wl_lock);
+
+ cpsb = (struct ubi_cp_sb *)cp_raw;
+ cp_pos += sizeof(*cpsb);
+ ubi_assert(cp_pos <= new_cp->size);
+
+ cph = (struct ubi_cp_hdr *)(cp_raw + cp_pos);
+ cp_pos += sizeof(*cph);
+ ubi_assert(cp_pos <= new_cp->size);
+
+ cpsb->magic = UBI_CP_SB_MAGIC;
+ cpsb->version = UBI_CP_FMT_VERSION;
+ cpsb->nblocks = cpu_to_be32(new_cp->used_blocks);
+ /* the max sqnum will be filled in while *reading* the checkpoint */
+ cpsb->sqnum = 0;
+
+ cph->magic = UBI_CP_HDR_MAGIC;
+ nfree = 0;
+ nused = 0;
+ nvol = 0;
+
+ cplpl = (struct ubi_cp_long_pool *)(cp_raw + cp_pos);
+ cp_pos += sizeof(*cplpl);
+ cplpl->magic = UBI_CP_LPOOL_MAGIC;
+ cplpl->size = cpu_to_be32(ubi->long_pool.size);
+
+ cpspl = (struct ubi_cp_short_pool *)(cp_raw + cp_pos);
+ cp_pos += sizeof(*cpspl);
+ cpspl->magic = UBI_CP_SPOOL_MAGIC;
+ cpspl->size = cpu_to_be32(ubi->short_pool.size);
+
+ cpupl = (struct ubi_cp_unk_pool *)(cp_raw + cp_pos);
+ cp_pos += sizeof(*cpupl);
+ cpupl->magic = UBI_CP_UPOOL_MAGIC;
+ cpupl->size = cpu_to_be32(ubi->unk_pool.size);
+
+ for (i = 0; i < ubi->long_pool.size; i++)
+ cplpl->pebs[i] = cpu_to_be32(ubi->long_pool.pebs[i]);
+
+ for (i = 0; i < ubi->short_pool.size; i++)
+ cpspl->pebs[i] = cpu_to_be32(ubi->short_pool.pebs[i]);
+
+ for (i = 0; i < ubi->unk_pool.size; i++)
+ cpupl->pebs[i] = cpu_to_be32(ubi->unk_pool.pebs[i]);
+
+ for (node = rb_first(&ubi->free); node; node = rb_next(node)) {
+ wl_e = rb_entry(node, struct ubi_wl_entry, u.rb);
+ cec = (struct ubi_cp_ec *)(cp_raw + cp_pos);
+
+ cec->pnum = cpu_to_be32(wl_e->pnum);
+ cec->ec = cpu_to_be32(wl_e->ec);
+
+ nfree++;
+ cp_pos += sizeof(*cec);
+ ubi_assert(cp_pos <= new_cp->size);
+ }
+ cph->nfree = cpu_to_be32(nfree);
+
+ for (node = rb_first(&ubi->used); node; node = rb_next(node)) {
+ wl_e = rb_entry(node, struct ubi_wl_entry, u.rb);
+ cec = (struct ubi_cp_ec *)(cp_raw + cp_pos);
+
+ cec->pnum = cpu_to_be32(wl_e->pnum);
+ cec->ec = cpu_to_be32(wl_e->ec);
+
+ nused++;
+ cp_pos += sizeof(*cec);
+ ubi_assert(cp_pos <= new_cp->size);
+ }
+ cph->nused = cpu_to_be32(nused);
+
+ for (i = 0; i < UBI_MAX_VOLUMES + UBI_INT_VOL_COUNT; i++) {
+ vol = ubi->volumes[i];
+
+ if (!vol)
+ continue;
+
+ nvol++;
+
+ cvh = (struct ubi_cp_volhdr *)(cp_raw + cp_pos);
+ cp_pos += sizeof(*cvh);
+ ubi_assert(cp_pos <= new_cp->size);
+
+ cvh->magic = UBI_CP_VHDR_MAGIC;
+ cvh->vol_id = cpu_to_be32(vol->vol_id);
+ cvh->vol_type = vol->vol_type;
+ cvh->used_ebs = cpu_to_be32(vol->used_ebs);
+ cvh->data_pad = cpu_to_be32(vol->data_pad);
+ cvh->last_eb_bytes = cpu_to_be32(vol->last_eb_bytes);
+
+ ubi_assert(vol->vol_type == UBI_DYNAMIC_VOLUME ||
+ vol->vol_type == UBI_STATIC_VOLUME);
+
+ for (j = 0; j < vol->used_ebs; j++) {
+ ceba = (struct ubi_cp_eba *)(cp_raw + cp_pos);
+
+ ceba->lnum = cpu_to_be32(j);
+ ceba->pnum = cpu_to_be32(vol->eba_tbl[j]);
+
+ cp_pos += sizeof(*ceba);
+ ubi_assert(cp_pos <= new_cp->size);
+ }
+ }
+ cph->nvol = cpu_to_be32(nvol);
+
+ svhdr->sqnum = cpu_to_be64(ubi_next_sqnum(ubi));
+ svhdr->lnum = 0;
+
+ spin_unlock(&ubi->wl_lock);
+ spin_unlock(&ubi->volumes_lock);
+
+ dbg_bld("Writing checkpoint SB to PEB %i\n", new_cp->peb[0]);
+ ret = ubi_io_write_vid_hdr(ubi, new_cp->peb[0], svhdr);
+ if (ret) {
+ ubi_err("Unable to write vid_hdr to checkpoint SB!\n");
+
+ goto out_kfree;
+ }
+
+ for (i = 0; i < UBI_CP_MAX_BLOCKS; i++) {
+ cpsb->block_loc[i] = cpu_to_be32(new_cp->peb[i]);
+ cpsb->block_ec[i] = cpu_to_be32(new_cp->ec[i]);
+ }
+
+ cpsb->data_crc = 0;
+ cpsb->data_crc = crc32_be(UBI_CRC32_INIT, cp_raw + sizeof(*cpsb),
+ new_cp->size - sizeof(*cpsb));
+
+ for (i = 1; i < new_cp->used_blocks; i++) {
+ dvhdr->sqnum = cpu_to_be64(ubi_next_sqnum(ubi));
+ dvhdr->lnum = cpu_to_be32(i);
+ dbg_bld("Writing checkpoint data to PEB %i sqnum %llu\n",
+ new_cp->peb[i], be64_to_cpu(dvhdr->sqnum));
+ ret = ubi_io_write_vid_hdr(ubi, new_cp->peb[i], dvhdr);
+ if (ret) {
+ ubi_err("Unable to write vid_hdr to PEB %i!\n",
+ new_cp->peb[i]);
+
+ goto out_kfree;
+ }
+ }
+
+ for (i = 0; i < new_cp->used_blocks; i++) {
+ ret = ubi_io_write(ubi, cp_raw + (i * ubi->leb_size),
+ new_cp->peb[i], ubi->leb_start, ubi->leb_size);
+ if (ret) {
+ ubi_err("Unable to write checkpoint to PEB %i!\n",
+ new_cp->peb[i]);
+
+ goto out_kfree;
+ }
+ }
+
+ ubi_assert(new_cp);
+ ubi->cp = new_cp;
+
+ dbg_bld("Checkpoint written!");
+
+out_kfree:
+ kfree(svhdr);
+out_vfree:
+ vfree(cp_raw);
+out:
+ return ret;
+}
+
+/**
+ * get_ec - returns the erase counter of a given PEB
+ * @ubi: UBI device object
+ * @pnum: PEB number
+ */
+static int get_ec(struct ubi_device *ubi, int pnum)
+{
+ struct ubi_wl_entry *e;
+
+ e = ubi->lookuptbl[pnum];
+
+ /* can this really happen? */
+ if (!e)
+ return ubi->mean_ec ?: 1;
+ else
+ return e->ec;
+}
+
+/**
+ * ubi_update_checkpoint - will be called by UBI if a volume changes or
+ * a checkpoint pool becomes full.
+ * @ubi: UBI device object
+ */
+int ubi_update_checkpoint(struct ubi_device *ubi)
+{
+ int ret, i;
+ struct ubi_checkpoint *new_cp;
+
+ if (ubi->ro_mode)
+ return 0;
+
+ new_cp = kmalloc(sizeof(*new_cp), GFP_KERNEL);
+ if (!new_cp)
+ return -ENOMEM;
+
+ ubi->old_cp = ubi->cp;
+ ubi->cp = NULL;
+
+ if (ubi->old_cp) {
+ new_cp->peb[0] = ubi_wl_get_cp_peb(ubi, UBI_CP_MAX_START);
+ /* no fresh early PEB was found, reuse the old one */
+ if (new_cp->peb[0] < 0) {
+ struct ubi_ec_hdr *ec_hdr;
+
+ ec_hdr = kmalloc(sizeof(*ec_hdr), GFP_KERNEL);
+ if (!ec_hdr) {
+ kfree(new_cp);
+ return -ENOMEM;
+ }
+
+ /* we have to erase the block by hand */
+
+ ret = ubi_io_read_ec_hdr(ubi, ubi->old_cp->peb[0],
+ ec_hdr, 0);
+ if (ret) {
+ ubi_err("Unable to read EC header");
+
+ kfree(new_cp);
+ kfree(ec_hdr);
+ return -EINVAL;
+ }
+
+ ret = ubi_io_sync_erase(ubi, ubi->old_cp->peb[0], 0);
+ if (ret < 0) {
+ ubi_err("Unable to erase old SB");
+
+ kfree(new_cp);
+ kfree(ec_hdr);
+ return -EINVAL;
+ }
+
+ ec_hdr->ec += ret;
+ if (ret > UBI_MAX_ERASECOUNTER) {
+ ubi_err("Erase counter overflow!");
+ kfree(new_cp);
+ kfree(ec_hdr);
+ return -EINVAL;
+ }
+
+ ret = ubi_io_write_ec_hdr(ubi, ubi->old_cp->peb[0],
+ ec_hdr);
+ kfree(ec_hdr);
+ if (ret) {
+ ubi_err("Unable to write new EC header");
+ kfree(new_cp);
+ return -EINVAL;
+ }
+
+ new_cp->peb[0] = ubi->old_cp->peb[0];
+ new_cp->ec[0] = ubi->old_cp->ec[0];
+ } else {
+ /* we've got a new early PEB, return the old one */
+ ubi_wl_put_cp_peb(ubi, ubi->old_cp->peb[0], 0);
+ new_cp->ec[0] = get_ec(ubi, new_cp->peb[0]);
+ }
+
+ /* return all other checkpoint block to the wl system */
+ for (i = 1; i < UBI_CP_MAX_BLOCKS; i++) {
+ if (ubi->old_cp->peb[i] >= 0)
+ ubi_wl_put_cp_peb(ubi, ubi->old_cp->peb[i], 0);
+ else
+ break;
+ }
+ } else {
+ new_cp->peb[0] = ubi_wl_get_cp_peb(ubi, UBI_CP_MAX_START);
+ if (new_cp->peb[0] < 0) {
+ ubi_err("Could not find an early PEB");
+ kfree(new_cp);
+ return -ENOSPC;
+ }
+ new_cp->ec[0] = get_ec(ubi, new_cp->peb[0]);
+ }
+
+ new_cp->size = sizeof(struct ubi_cp_hdr) + \
+ sizeof(struct ubi_cp_long_pool) + \
+ sizeof(struct ubi_cp_short_pool) + \
+ sizeof(struct ubi_cp_unk_pool) + \
+ ubi->peb_count * (sizeof(struct ubi_cp_ec) + \
+ sizeof(struct ubi_cp_eba)) + \
+ sizeof(struct ubi_cp_volhdr) * UBI_MAX_VOLUMES;
+ new_cp->size = roundup(new_cp->size, ubi->leb_size);
+
+ new_cp->used_blocks = new_cp->size / ubi->leb_size;
+
+ if (new_cp->used_blocks > UBI_CP_MAX_BLOCKS) {
+ ubi_err("Checkpoint too large");
+ kfree(new_cp);
+
+ return -ENOSPC;
+ }
+
+ /* give the wl subsystem a chance to produce some free blocks */
+ cond_resched();
+
+ for (i = 1; i < UBI_CP_MAX_BLOCKS; i++) {
+ if (i < new_cp->used_blocks) {
+ new_cp->peb[i] = ubi_wl_get_cp_peb(ubi, INT_MAX);
+ if (new_cp->peb[i] < 0) {
+ ubi_err("Could not get any free erase block");
+
+ while (i--)
+ ubi_wl_put_cp_peb(ubi, new_cp->peb[i],
+ 0);
+
+ kfree(new_cp);
+
+ return -ENOSPC;
+ }
+
+ new_cp->ec[i] = get_ec(ubi, new_cp->peb[i]);
+ } else {
+ new_cp->peb[i] = -1;
+ new_cp->ec[i] = 0;
+ }
+ }
+
+ kfree(ubi->old_cp);
+ ubi->old_cp = NULL;
+
+ return ubi_write_checkpoint(ubi, new_cp);
+}
diff --git a/drivers/mtd/ubi/scan.c b/drivers/mtd/ubi/scan.c
index 5d4c1d3..7d04008 100644
--- a/drivers/mtd/ubi/scan.c
+++ b/drivers/mtd/ubi/scan.c
@@ -1011,7 +1011,15 @@ static int process_eb(struct ubi_device *ubi, struct ubi_scan_info *si,
}
vol_id = be32_to_cpu(vidh->vol_id);
- if (vol_id > UBI_MAX_VOLUMES && vol_id != UBI_LAYOUT_VOLUME_ID) {
+#ifdef CONFIG_MTD_UBI_CHECKPOINT
+ if (vol_id > UBI_MAX_VOLUMES &&
+ vol_id != UBI_LAYOUT_VOLUME_ID &&
+ vol_id != UBI_CP_SB_VOLUME_ID &&
+ vol_id != UBI_CP_DATA_VOLUME_ID)
+#else
+ if (vol_id > UBI_MAX_VOLUMES && vol_id != UBI_LAYOUT_VOLUME_ID)
+#endif
+ {
int lnum = be32_to_cpu(vidh->lnum);
/* Unsupported internal volume */
diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
index df267bb..8d44152 100644
--- a/drivers/mtd/ubi/ubi.h
+++ b/drivers/mtd/ubi/ubi.h
@@ -625,11 +625,19 @@ int ubi_enumerate_volumes(struct notifier_block *nb);
void ubi_do_get_device_info(struct ubi_device *ubi, struct ubi_device_info *di);
void ubi_do_get_volume_info(struct ubi_device *ubi, struct ubi_volume *vol,
struct ubi_volume_info *vi);
-
/* scan.c */
int ubi_compare_lebs(struct ubi_device *ubi, const struct ubi_scan_leb *seb,
int pnum, const struct ubi_vid_hdr *vid_hdr);
+#ifdef CONFIG_MTD_UBI_CHECKPOINT
+/* checkpoint.c */
+int ubi_update_checkpoint(struct ubi_device *ubi);
+struct ubi_scan_info *ubi_read_checkpoint(struct ubi_device *ubi,
+ int cb_sb_pnum);
+int ubi_update_checkpoint(struct ubi_device *ubi);
+int ubi_find_checkpoint(struct ubi_device *ubi);
+#endif
+
/*
* ubi_rb_for_each_entry - walk an RB-tree.
* @rb: a pointer to type 'struct rb_node' to use as a loop counter
--
1.7.6.5
Signed-off-by: Richard Weinberger <[email protected]>
---
drivers/mtd/ubi/Kconfig | 29 ++++++++++++++-
drivers/mtd/ubi/build.c | 86 ++++++++++++++++++++++++++++++++++++++++++
drivers/mtd/ubi/checkpoint.c | 4 ++
drivers/mtd/ubi/ubi-media.h | 6 +-
4 files changed, 121 insertions(+), 4 deletions(-)
diff --git a/drivers/mtd/ubi/Kconfig b/drivers/mtd/ubi/Kconfig
index 3ba9978..12888a4 100644
--- a/drivers/mtd/ubi/Kconfig
+++ b/drivers/mtd/ubi/Kconfig
@@ -56,10 +56,37 @@ config MTD_UBI_CHECKPOINT
depends on EXPERIMENTAL
default n
help
- This option enables UBIVIS (AKA checkpointing).
+ This option enables UBIVIS (aka checkpointing).
It allows attaching UBI devices without scanning the whole MTD
device. Instead it extracts all needed information from a checkpoint.
+config MTD_UBI_CHECKPOINT_POOL_SIZE
+ int "Max number of PEBs in a UBIVIS pool"
+ range 10 1024
+ default 128
+ help
+ This is the number PEBs which have to be scanned while attaching.
+ A low value means that attaching will be faster but if the value
+ is too small the checkpoint has to be written too often.
+ Every time the pool is full a new checkpoint is written to the MTD.
+ Note that we have currently three pools.
+ Choose wisely!
+
+config MTD_UBI_CHECKPOINT_MAX_SIZE
+ int "Maximal size of a checkpoint in PEBs"
+ range 10 128
+ default 32
+ help
+ Maximale size of a checkpoint in PEBs.
+
+config MTD_UBI_CHECKPOINT_SB_POS
+ int "Checkpoint super block position"
+ range 4 128
+ default 64
+ help
+ The checkpoint super block will be placed within the first N PEBs.
+ Is this value too large it takes longer to find the checkpoint.
+
config MTD_UBI_DEBUG
bool "UBI debugging"
depends on SYSFS
diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c
index 0fde9fc..316f27a 100644
--- a/drivers/mtd/ubi/build.c
+++ b/drivers/mtd/ubi/build.c
@@ -148,6 +148,17 @@ int ubi_volume_notify(struct ubi_device *ubi, struct ubi_volume *vol, int ntype)
ubi_do_get_device_info(ubi, &nt.di);
ubi_do_get_volume_info(ubi, vol, &nt.vi);
+
+#ifdef CONFIG_MTD_UBI_CHECKPOINT
+ switch (ntype) {
+ case UBI_VOLUME_ADDED:
+ case UBI_VOLUME_REMOVED:
+ case UBI_VOLUME_RESIZED:
+ case UBI_VOLUME_RENAMED:
+ if (ubi_update_checkpoint(ubi))
+ ubi_err("Unable to update checkpoint!");
+ }
+#endif
return blocking_notifier_call_chain(&ubi_notifiers, ntype, &nt);
}
@@ -852,6 +863,61 @@ static int autoresize(struct ubi_device *ubi, int vol_id)
return 0;
}
+#ifdef CONFIG_MTD_UBI_CHECKPOINT
+static int attach_by_checkpointing(struct ubi_device *ubi)
+{
+ int cp_start, err;
+ struct ubi_scan_info *si;
+
+ cp_start = ubi_find_checkpoint(ubi);
+ if (cp_start < 0)
+ return -ENOENT;
+
+ si = ubi_read_checkpoint(ubi, cp_start);
+ if (IS_ERR(si))
+ return PTR_ERR(si);
+
+ ubi->bad_peb_count = 0;
+ ubi->good_peb_count = ubi->peb_count;
+ ubi->corr_peb_count = 0;
+ ubi->max_ec = si->max_ec;
+ ubi->mean_ec = si->mean_ec;
+ ubi_msg("max. sequence number: %llu", si->max_sqnum);
+
+ err = ubi_read_volume_table(ubi, si);
+ if (err) {
+ ubi_err("ubi_read_volume_table failed");
+ goto out_si;
+ }
+
+ err = ubi_wl_init_scan(ubi, si);
+ if (err) {
+ ubi_err("ubi_wl_init_scan failed!");
+ goto out_vtbl;
+ }
+
+ err = ubi_eba_init_scan(ubi, si);
+ if (err) {
+ ubi_err("ubi_eba_init_scan failed!");
+ goto out_wl;
+ }
+
+ ubi_msg("successfully recovered from checkpoint!");
+ ubi_scan_destroy_si(si);
+ return 0;
+
+out_wl:
+ ubi_wl_close(ubi);
+out_vtbl:
+ free_internal_volumes(ubi);
+ vfree(ubi->vtbl);
+out_si:
+ ubi_scan_destroy_si(si);
+
+ return err;
+}
+#endif
+
/**
* ubi_attach_mtd_dev - attach an MTD device.
* @mtd: MTD device description object
@@ -931,6 +997,15 @@ int ubi_attach_mtd_dev(struct mtd_info *mtd, int ubi_num, int vid_hdr_offset)
ubi->vid_hdr_offset = vid_hdr_offset;
ubi->autoresize_vol_id = -1;
+#ifdef CONFIG_MTD_UBI_CHECKPOINT
+ ubi->long_pool.used = ubi->long_pool.size = \
+ ubi->long_pool.max_size = ARRAY_SIZE(ubi->long_pool.pebs);
+ ubi->short_pool.used = ubi->short_pool.size = \
+ ubi->short_pool.max_size = ARRAY_SIZE(ubi->short_pool.pebs);
+ ubi->unk_pool.used = ubi->unk_pool.size = \
+ ubi->unk_pool.max_size = ARRAY_SIZE(ubi->unk_pool.pebs);
+#endif
+
mutex_init(&ubi->buf_mutex);
mutex_init(&ubi->ckvol_mutex);
mutex_init(&ubi->device_mutex);
@@ -953,7 +1028,18 @@ int ubi_attach_mtd_dev(struct mtd_info *mtd, int ubi_num, int vid_hdr_offset)
if (err)
goto out_free;
+#ifdef CONFIG_MTD_UBI_CHECKPOINT
+ err = attach_by_checkpointing(ubi);
+
+ if (err) {
+ if (err != -ENOENT)
+ ubi_msg("falling back to attach by scanning mode!\n");
+
+ err = attach_by_scanning(ubi);
+ }
+#else
err = attach_by_scanning(ubi);
+#endif
if (err) {
dbg_err("failed to attach by scanning, error %d", err);
goto out_debugging;
diff --git a/drivers/mtd/ubi/checkpoint.c b/drivers/mtd/ubi/checkpoint.c
index f43441c..867f32d 100644
--- a/drivers/mtd/ubi/checkpoint.c
+++ b/drivers/mtd/ubi/checkpoint.c
@@ -993,6 +993,10 @@ int ubi_update_checkpoint(struct ubi_device *ubi)
int ret, i;
struct ubi_checkpoint *new_cp;
+ BUILD_BUG_ON(UBI_CP_MAX_START < 3);
+ BUILD_BUG_ON(UBI_CP_MAX_BLOCKS < 10);
+ BUILD_BUG_ON(UBI_CP_MAX_POOL_SIZE < 10);
+
if (ubi->ro_mode)
return 0;
diff --git a/drivers/mtd/ubi/ubi-media.h b/drivers/mtd/ubi/ubi-media.h
index 7223b02..4d14b9e 100644
--- a/drivers/mtd/ubi/ubi-media.h
+++ b/drivers/mtd/ubi/ubi-media.h
@@ -382,9 +382,9 @@ struct ubi_vtbl_record {
/* Checkoint format version */
#define UBI_CP_FMT_VERSION 1
-#define UBI_CP_MAX_START 64
-#define UBI_CP_MAX_BLOCKS 32
-#define UBI_CP_MAX_POOL_SIZE 128
+#define UBI_CP_MAX_START CONFIG_MTD_UBI_CHECKPOINT_SB_POS
+#define UBI_CP_MAX_BLOCKS CONFIG_MTD_UBI_CHECKPOINT_MAX_SIZE
+#define UBI_CP_MAX_POOL_SIZE CONFIG_MTD_UBI_CHECKPOINT_POOL_SIZE
#define UBI_CP_SB_MAGIC 0x7B11D69F
#define UBI_CP_HDR_MAGIC 0xD4B82EF7
#define UBI_CP_VHDR_MAGIC 0xFA370ED1
--
1.7.6.5
On Wed, 2012-05-09 at 19:38 +0200, Richard Weinberger wrote:
> The following patch set implements UBIVIS (checkpointing) support for
> UBI.
Hi Richard, I would like to complain about the names again. I though I
better give this feed back as soon as possible...
Firs of all, thanks for doing this, I will look closer, and I am very
keen of merging this stuff once we are sure its design is good, allows
for future extensions and is backward-compatible.
Then naming :-) We discussed checkpoints in this list long time ago I
think. If you ask a random UBI user what would be UBI with
checkpointing, I am sure most people would tell you that this would mean
an ability to checkpoint a volume at any point of time, then do
arbitrary volume changes (e.g., upgrade the system, re-flash it), and
then be able to return to any of the old checkpoints.
This name is rally reserved to semantics like that. Btrfs implements
checkpoints. UBIFS could, in theory do as well. And UBI could do in
theory - you just need a large pool of unused PEBs and then you do COW.
Please, do not use word "checkpoint" for what you do at all - this is
asking for troubles - people will be confused.
Also, I think this new feature should be always compiled in. I do not
think we need this ifdef forest at all. You can detect run-time the
on-flash format version.
How about calling this "summary" as in JFFS2, or fastmap/fmap ?
Sorry for being pedantic, but clear terminology is really important, I
think.
Also the naming logic and the internal layout should allow us to add
more features. E.g., if someone comes up with real journal.
So may be just naming your stuff UBI2, having terms like "UBI2 format",
would be the easiest? Then someone could make this to be UBI3. A
documentation section could describe what UBI2 is and how it is
different from UBI1 or just UBI.
Thanks!
--
Best Regards,
Artem Bityutskiy
Hi Artem!
Am 10.05.2012 06:26, schrieb Artem Bityutskiy:
> Hi Richard, I would like to complain about the names again. I though I
> better give this feed back as soon as possible...
No problem. :)
> Firs of all, thanks for doing this, I will look closer, and I am very
> keen of merging this stuff once we are sure its design is good, allows
> for future extensions and is backward-compatible.
>
Yeah.
First of all, yes it's fully backward-compatible. It uses two new internal volume IDs
with compat = UBI_COMPAT_DELETE.
Old UBI implementations will delete the checkpoint and continue with scanning...
Regarding design, ubi_wl_get_peb() currently offers three types of data types.
UBI_LONGTERM, UBI_SHORTTERM and UBI_UNKNOWN. Do we really need them?
Checkpointing has a pool of unknown PEBs. This PEBs have to be scanned while attaching.
For now I had to create three pools (for UBI_LONGTERM, UBI_SHORTTERM and UBI_UNKNOWN).
This makes the whole thing complexer than needed.
It introduces also some nasty corner cases.
To make the review easier for you:
The most critical code path is scan_pool() -> process_pool_seb() -> update_vol().
It searches within a pool for PEBs which are no longer empty and scans them.
After that it updates the corresponding volume.
ubi_update_checkpoint() is also very important because it has to find
unused PEBs at the beginning of the MTD to place the super block.
> So may be just naming your stuff UBI2, having terms like "UBI2 format",
> would be the easiest? Then someone could make this to be UBI3. A
> documentation section could describe what UBI2 is and how it is
> different from UBI1 or just UBI.
Okay, got your point.
I think "fastmap" is a good name because I can also use it within the code.
So, while reviewing the code please keep s/checkpoint/fastmap/g and s/cp/fm/g in mind. ;-)
I like the UBI2 idea. UBI2 = UBI + fastmap.
After the UBI2/fastmap design is stable I will happily write a detailed design paper
for http://www.linux-mtd.infradead.org/doc/ubi.html.
Thanks,
//richard
On Thu, 2012-05-10 at 10:33 +0200, Richard Weinberger wrote:
> First of all, yes it's fully backward-compatible. It uses two new internal volume IDs
> with compat = UBI_COMPAT_DELETE.
> Old UBI implementations will delete the checkpoint and continue with scanning...
OK. BTW, these patches do not compile when the fastmap is disabled. I
hope you'll just kill the ifdefs in the next revision and this problem
will go away.
> Regarding design, ubi_wl_get_peb() currently offers three types of data types.
> UBI_LONGTERM, UBI_SHORTTERM and UBI_UNKNOWN. Do we really need them?
> Checkpointing has a pool of unknown PEBs. This PEBs have to be scanned while attaching.
> For now I had to create three pools (for UBI_LONGTERM, UBI_SHORTTERM and UBI_UNKNOWN).
> This makes the whole thing complexer than needed.
> It introduces also some nasty corner cases.
But AFAIR we already agreed that we kill these, no? I thought you'll
send a separate patch for this. We do not need this feature and to our
shame it even was not working and there was a bug found very recently.
> To make the review easier for you:
> The most critical code path is scan_pool() -> process_pool_seb() -> update_vol().
> It searches within a pool for PEBs which are no longer empty and scans them.
> After that it updates the corresponding volume.
OK, thanks.
> ubi_update_checkpoint() is also very important because it has to find
> unused PEBs at the beginning of the MTD to place the super block.
OK.
> Okay, got your point.
> I think "fastmap" is a good name because I can also use it within the code.
> So, while reviewing the code please keep s/checkpoint/fastmap/g and s/cp/fm/g in mind. ;-)
OK.
--
Best Regards,
Artem Bityutskiy
Am 11.05.2012 12:46, schrieb Artem Bityutskiy:
> On Thu, 2012-05-10 at 10:33 +0200, Richard Weinberger wrote:
>> First of all, yes it's fully backward-compatible. It uses two new internal volume IDs
>> with compat = UBI_COMPAT_DELETE.
>> Old UBI implementations will delete the checkpoint and continue with scanning...
>
> OK. BTW, these patches do not compile when the fastmap is disabled. I
> hope you'll just kill the ifdefs in the next revision and this problem
> will go away.
It builds fine here with CONFIG_MTD_UBI_CHECKPOINT=n.
What error do you get?
>> Regarding design, ubi_wl_get_peb() currently offers three types of data types.
>> UBI_LONGTERM, UBI_SHORTTERM and UBI_UNKNOWN. Do we really need them?
>> Checkpointing has a pool of unknown PEBs. This PEBs have to be scanned while attaching.
>> For now I had to create three pools (for UBI_LONGTERM, UBI_SHORTTERM and UBI_UNKNOWN).
>> This makes the whole thing complexer than needed.
>> It introduces also some nasty corner cases.
>
> But AFAIR we already agreed that we kill these, no? I thought you'll
> send a separate patch for this. We do not need this feature and to our
> shame it even was not working and there was a bug found very recently.
Perfect!
I was confused and thought that this feature will be removed later...
Thanks,
//richard
I'd like to a git tree for this stuff at some point, when you feel you
are ready, and then do all further changes incrementally on top of that.
Then would also be able to participate - at least do minor things myself
and send patches to you.
On Wed, 2012-05-09 at 19:38 +0200, Richard Weinberger wrote:
> +#ifdef CONFIG_MTD_UBI_CHECKPOINT
> +#define UBI_CP_SB_VOLUME_ID (UBI_LAYOUT_VOLUME_ID + 1)
> +#define UBI_CP_DATA_VOLUME_ID (UBI_CP_SB_VOLUME_ID + 1)
#define UBI_CP_DATA_VOLUME_ID (UBI_LAYOUT_VOLUME_ID + 2)
is more readable I think.
> +#define UBI_CP_MAX_BLOCKS 32
This really needs a comment on top of it telling how big flash with say,
128KiB PEB size this would support.
> +
> +/**
> + * struct ubi_cp_sb - UBI checkpoint super block
> + * @magic: checkpoint super block magic number (%UBI_CP_SB_MAGIC)
> + * @version: format version of this checkpoint
> + * @data_crc: CRC over the checkpoint data
> + * @nblocks: number of PEBs used by this checkpoint
> + * @block_loc: an array containing the location of all PEBs of the checkpoint
> + * @block_ec: the erase counter of each used PEB
> + * @sqnum: highest sequence number value at the time while taking the checkpoint
> + *
> + * The checkpoint
> + */
> +struct ubi_cp_sb {
> + __be32 magic;
> + __u8 version;
> + __be32 data_crc;
> + __be32 nblocks;
> + __be32 block_loc[UBI_CP_MAX_BLOCKS];
> + __be32 block_ec[UBI_CP_MAX_BLOCKS];
> + __be64 sqnum;
> +} __packed;
Please, unless it is size-critical, always leave some unused space in
on-flash data structure for possible future extensions, and initialize
them to 0.
BTW, side-note, please, check that you follow the convention of UBI and
make all on-flash data structures 64-bit aligned (of course unless it is
size-critical).
> +/**
> + * struct ubi_cp_long_pool - Checkpoint pool with long term used PEBs
> + * @magic: long pool magic numer (%UBI_CP_LPOOL_MAGIC)
> + * @size: current pool size
> + * @pebs: an array containing the location of all PEBs in this pool
> + */
> +struct ubi_cp_long_pool {
> + __be32 magic;
> + __be32 size;
> + __be32 pebs[UBI_CP_MAX_POOL_SIZE];
> +} __packed;
What's the perpose of having these pools - once you read all the
information from the fastmap and the wl subsystem inserts it to the
RB-trees - you already know this data. Why you need to store this on the
flash? This whole pool think look redundant and unneeded.
> +/**
> + * struct ubi_cp_ec - stores the erase counter of a PEB
> + * @pnum: PEB number
> + * @ec: ec of this PEB
> + */
> +struct ubi_cp_ec {
> + __be32 pnum;
> + __be32 ec;
> +} __packed;
It is weird that you do not have an array of ECs instead for _every_
PEB. Why wasting the flash and time writing/reading this data?
> +/**
> + * struct ubi_cp_eba - denotes an association beween a PEB and LEB
> + * @lnum: LEB number
> + * @pnum: PEB number
> + */
> +struct ubi_cp_eba {
> + __be32 lnum;
> + __be32 pnum;
> +} __packed;
Same here - I'd expect a simple array for every PEB in the system.
--
Best Regards,
Artem Bityutskiy
On Fri, 2012-05-11 at 12:49 +0200, Richard Weinberger wrote:
> Am 11.05.2012 12:46, schrieb Artem Bityutskiy:
> > On Thu, 2012-05-10 at 10:33 +0200, Richard Weinberger wrote:
> >> First of all, yes it's fully backward-compatible. It uses two new internal volume IDs
> >> with compat = UBI_COMPAT_DELETE.
> >> Old UBI implementations will delete the checkpoint and continue with scanning...
> >
> > OK. BTW, these patches do not compile when the fastmap is disabled. I
> > hope you'll just kill the ifdefs in the next revision and this problem
> > will go away.
>
> It builds fine here with CONFIG_MTD_UBI_CHECKPOINT=n.
> What error do you get?
If you are going to kill the ifdefs anyway, I won't spend time on this.
I have several defconfigs with different combinations of the UBI
configuration options, probably one of them did not build. Or may be I
made a mistake.
--
Best Regards,
Artem Bityutskiy
Am 11.05.2012 13:17, schrieb Artem Bityutskiy:
> I'd like to a git tree for this stuff at some point, when you feel you
> are ready, and then do all further changes incrementally on top of that.
> Then would also be able to participate - at least do minor things myself
> and send patches to you.
>
> On Wed, 2012-05-09 at 19:38 +0200, Richard Weinberger wrote:
>> +#ifdef CONFIG_MTD_UBI_CHECKPOINT
>> +#define UBI_CP_SB_VOLUME_ID (UBI_LAYOUT_VOLUME_ID + 1)
>> +#define UBI_CP_DATA_VOLUME_ID (UBI_CP_SB_VOLUME_ID + 1)
>
> #define UBI_CP_DATA_VOLUME_ID (UBI_LAYOUT_VOLUME_ID + 2)
>
> is more readable I think.
Okay.
>> +#define UBI_CP_MAX_BLOCKS 32
>
> This really needs a comment on top of it telling how big flash with say,
> 128KiB PEB size this would support.
This is a default value which was randomly chosen.
The last patch makes this value configurable via Kconfig.
It is one of these parameters where we have to find sane a default value.
>> +
>> +/**
>> + * struct ubi_cp_sb - UBI checkpoint super block
>> + * @magic: checkpoint super block magic number (%UBI_CP_SB_MAGIC)
>> + * @version: format version of this checkpoint
>> + * @data_crc: CRC over the checkpoint data
>> + * @nblocks: number of PEBs used by this checkpoint
>> + * @block_loc: an array containing the location of all PEBs of the checkpoint
>> + * @block_ec: the erase counter of each used PEB
>> + * @sqnum: highest sequence number value at the time while taking the checkpoint
>> + *
>> + * The checkpoint
>> + */
>> +struct ubi_cp_sb {
>> + __be32 magic;
>> + __u8 version;
>> + __be32 data_crc;
>> + __be32 nblocks;
>> + __be32 block_loc[UBI_CP_MAX_BLOCKS];
>> + __be32 block_ec[UBI_CP_MAX_BLOCKS];
>> + __be64 sqnum;
>> +} __packed;
>
> Please, unless it is size-critical, always leave some unused space in
> on-flash data structure for possible future extensions, and initialize
> them to 0.
If the fastmap on-flash layout changes, I'll increment the "version" field.
But I can leave some space.
> BTW, side-note, please, check that you follow the convention of UBI and
> make all on-flash data structures 64-bit aligned (of course unless it is
> size-critical).
Will check.
>> +/**
>> + * struct ubi_cp_long_pool - Checkpoint pool with long term used PEBs
>> + * @magic: long pool magic numer (%UBI_CP_LPOOL_MAGIC)
>> + * @size: current pool size
>> + * @pebs: an array containing the location of all PEBs in this pool
>> + */
>> +struct ubi_cp_long_pool {
>> + __be32 magic;
>> + __be32 size;
>> + __be32 pebs[UBI_CP_MAX_POOL_SIZE];
>> +} __packed;
>
> What's the perpose of having these pools - once you read all the
> information from the fastmap and the wl subsystem inserts it to the
> RB-trees - you already know this data. Why you need to store this on the
> flash? This whole pool think look redundant and unneeded.
We need this pool to find all PEBs that have changed since we wrote the last
checkpoint (or fastmap).
BTW: That's why we named it "UBI Checkpointing".
If all PEBs in the pool are used (IOW no longer empty) we fill the pool with empty PEBs
and write a new checkpoint.
While reading the checkpoint we know that only PEBs within the pool my have changed...
Without this pool we'd have to write the checkpoint every time the EBA changes
or we'd have to scan the whole list of free PEBs while attaching.
>> +/**
>> + * struct ubi_cp_ec - stores the erase counter of a PEB
>> + * @pnum: PEB number
>> + * @ec: ec of this PEB
>> + */
>> +struct ubi_cp_ec {
>> + __be32 pnum;
>> + __be32 ec;
>> +} __packed;
>
> It is weird that you do not have an array of ECs instead for _every_
> PEB. Why wasting the flash and time writing/reading this data?
By array of ECs you mean that all ec values are written to the flash
and pnum is the index?
Sounds sane.
>> +/**
>> + * struct ubi_cp_eba - denotes an association beween a PEB and LEB
>> + * @lnum: LEB number
>> + * @pnum: PEB number
>> + */
>> +struct ubi_cp_eba {
>> + __be32 lnum;
>> + __be32 pnum;
>> +} __packed;
>
> Same here - I'd expect a simple array for every PEB in the system.
>
Sounds sane too. :)
Thanks,
//richard
On Fri, 2012-05-11 at 14:02 +0200, Richard Weinberger wrote:
> > Please, unless it is size-critical, always leave some unused space in
> > on-flash data structure for possible future extensions, and initialize
> > them to 0.
>
> If the fastmap on-flash layout changes, I'll increment the "version" field.
> But I can leave some space.
But if you have extra space you _often_ have chances to add new features
in a backward-compatible way, without incrementing the version, which
would make new images backward-incompatible. It really may make a big
difference.
This is not _always_ possible, of course, but sometimes it is. So if it
does not hurt, leave extra space in all on flash data structures (but
probably not in those you have large arrays of, though). This is just a
good practice I think. For some reason I have a feeling that we
discussed this already :-) I think I sent you an e-mail with various
questions which you did not answer long time ago, but it was long time
ago, so does not matter anymore.
> > What's the perpose of having these pools - once you read all the
> > information from the fastmap and the wl subsystem inserts it to the
> > RB-trees - you already know this data. Why you need to store this on the
> > flash? This whole pool think look redundant and unneeded.
>
> We need this pool to find all PEBs that have changed since we wrote the last
> checkpoint (or fastmap).
OK, I see, thanks.
> BTW: That's why we named it "UBI Checkpointing".
> If all PEBs in the pool are used (IOW no longer empty) we fill the pool with empty PEBs
> and write a new checkpoint.
> While reading the checkpoint we know that only PEBs within the pool my have changed...
>
> Without this pool we'd have to write the checkpoint every time the EBA changes
> or we'd have to scan the whole list of free PEBs while attaching.
I just had an impression that you write the fastmap only on unmount or
at some kind of "sync" points, and power cuts always would lead to
re-scan.
> > It is weird that you do not have an array of ECs instead for _every_
> > PEB. Why wasting the flash and time writing/reading this data?
>
> By array of ECs you mean that all ec values are written to the flash
> and pnum is the index?
> Sounds sane.
Yes, to me it sounds like the only sane way, unless there is a strong
reason to have redundant "pnum" fields. :-)
--
Best Regards,
Artem Bityutskiy
Am 11.05.2012 14:21, schrieb Artem Bityutskiy:
>>> It is weird that you do not have an array of ECs instead for _every_
>>> PEB. Why wasting the flash and time writing/reading this data?
>>
>> By array of ECs you mean that all ec values are written to the flash
>> and pnum is the index?
>> Sounds sane.
>
> Yes, to me it sounds like the only sane way, unless there is a strong
> reason to have redundant "pnum" fields. :-)
While looking at my own code a bit closer I found out why I haven't used the
array approach. B-)
Currently only ec values for PEBs within the free and used list are stored.
Therefore, the array can have gaps. E.g. If PEB X is in the erroneous list.
Thanks,
//richard
On Fri, 2012-05-11 at 19:15 +0200, Richard Weinberger wrote:
> Am 11.05.2012 14:21, schrieb Artem Bityutskiy:
> >>> It is weird that you do not have an array of ECs instead for _every_
> >>> PEB. Why wasting the flash and time writing/reading this data?
> >>
> >> By array of ECs you mean that all ec values are written to the flash
> >> and pnum is the index?
> >> Sounds sane.
> >
> > Yes, to me it sounds like the only sane way, unless there is a strong
> > reason to have redundant "pnum" fields. :-)
>
> While looking at my own code a bit closer I found out why I haven't used the
> array approach. B-)
> Currently only ec values for PEBs within the free and used list are stored.
> Therefore, the array can have gaps. E.g. If PEB X is in the erroneous list.
I think this is not a good enough justification. I think we may use
0xFFFFFFFF and other high EC values to indicate that the block was bad
or erroneous or whatever.
BTW, did you think about scenario of moving dumping UBI2 on on one
device with one bad PEBs distribution and then flashing it to a
different device with a different bad PEB distribution? What would
happen when we have fastmap enabled? Also, what if I write it to a
larger flash with otherwise the same geometry?
I guess we could detect these things and fall-back to scanning?
--
Best Regards,
Artem Bityutskiy
Am 11.05.2012 20:56, schrieb Artem Bityutskiy:
> I think this is not a good enough justification. I think we may use
> 0xFFFFFFFF and other high EC values to indicate that the block was bad
> or erroneous or whatever.
Okay, then we have to store all PEB ec values. (used, free, erroneous and scrub)
This is not a big deal.
As I said, currently only used and free PEBs are stored.
I think we need also a better solution for the protection queue.
My current solution (ubi_flush_prot_queue) is not the right thing.
Today I've observed a data corruption issue an I'm sure it happened
because fastmap did the wrong thing with the protection queue.
The problem is that a PEB in the protection queue is not visible to fastmap.
(Because it writes only used and free PEBs on the flash).
> BTW, did you think about scenario of moving dumping UBI2 on on one
> device with one bad PEBs distribution and then flashing it to a
> different device with a different bad PEB distribution? What would
> happen when we have fastmap enabled? Also, what if I write it to a
> larger flash with otherwise the same geometry?
>
> I guess we could detect these things and fall-back to scanning?
Falling back to scanning is easy.
But how can we detect such a change?
Thanks,
//richard