2012-10-05 11:55:14

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system

This is a new patch set for the f2fs file system.

What is F2FS?
=============

NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
been widely being used for ranging from mobile to server systems. Since they are
known to have different characteristics from the conventional rotational disks,
a file system, an upper layer to the storage device, should adapt to the changes
from the sketch.

F2FS is a new file system carefully designed for the NAND flash memory-based storage
devices. We chose a log structure file system approach, but we tried to adapt it
to the new form of storage. Also we remedy some known issues of the very old log
structured file system, such as snowball effect of wandering tree and high cleaning
overhead.

Because a NAND-based storage device shows different characteristics according to
its internal geometry or flash memory management scheme aka FTL, we add various
parameters not only for configuring on-disk layout, but also for selecting allocation
and cleaning algorithms.

Patch set
=========

The patch #1 adds a document to Documentation/filesystems/.
The patch #2 adds a header file of on-disk layout to include/linux/.
The patches #3-#15 adds f2fs source files to fs/f2fs/.
The Last patch, patch #16, updates Makefile and Kconfig.

mkfs.f2fs
=========

The file system formatting tool, "mkfs.f2fs", is available from the following
download page: http://sourceforge.net/projects/f2fs-tools/

Usage
=====

If you'd like to experience f2fs, simply:
# mkfs.f2fs /dev/sdb1
# mount -t f2fs /dev/sdb1 /mnt/f2fs

Short log
=========

Jaegeuk Kim (16):
f2fs: add document
f2fs: add on-disk layout
f2fs: add superblock and major in-memory structure
f2fs: add super block operations
f2fs: add checkpoint operations
f2fs: add node operations
f2fs: add segment operations
f2fs: add file operations
f2fs: add address space operations for data
f2fs: add core inode operations
f2fs: add inode operations for special inodes
f2fs: add core directory operations
f2fs: add xattr and acl functionalities
f2fs: add garbage collection functions
f2fs: add recovery routines for roll-forward
f2fs: update Kconfig and Makefile

Documentation/filesystems/00-INDEX | 2 +
Documentation/filesystems/f2fs.txt | 314 +++++++
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/f2fs/Kconfig | 55 ++
fs/f2fs/Makefile | 6 +
fs/f2fs/acl.c | 402 ++++++++
fs/f2fs/acl.h | 57 ++
fs/f2fs/checkpoint.c | 791 ++++++++++++++++
fs/f2fs/data.c | 700 ++++++++++++++
fs/f2fs/dir.c | 657 +++++++++++++
fs/f2fs/f2fs.h | 981 ++++++++++++++++++++
fs/f2fs/file.c | 643 +++++++++++++
fs/f2fs/gc.c | 1140 +++++++++++++++++++++++
fs/f2fs/gc.h | 203 +++++
fs/f2fs/hash.c | 98 ++
fs/f2fs/inode.c | 258 ++++++
fs/f2fs/namei.c | 549 +++++++++++
fs/f2fs/node.c | 1773 ++++++++++++++++++++++++++++++++++++
fs/f2fs/node.h | 331 +++++++
fs/f2fs/recovery.c | 372 ++++++++
fs/f2fs/segment.c | 1755 +++++++++++++++++++++++++++++++++++
fs/f2fs/segment.h | 627 +++++++++++++
fs/f2fs/super.c | 550 +++++++++++
fs/f2fs/xattr.c | 387 ++++++++
fs/f2fs/xattr.h | 142 +++
include/linux/f2fs_fs.h | 359 ++++++++
27 files changed, 13154 insertions(+)
create mode 100644 Documentation/filesystems/f2fs.txt
create mode 100644 fs/f2fs/Kconfig
create mode 100644 fs/f2fs/Makefile
create mode 100644 fs/f2fs/acl.c
create mode 100644 fs/f2fs/acl.h
create mode 100644 fs/f2fs/checkpoint.c
create mode 100644 fs/f2fs/data.c
create mode 100644 fs/f2fs/dir.c
create mode 100644 fs/f2fs/f2fs.h
create mode 100644 fs/f2fs/file.c
create mode 100644 fs/f2fs/gc.c
create mode 100644 fs/f2fs/gc.h
create mode 100644 fs/f2fs/hash.c
create mode 100644 fs/f2fs/inode.c
create mode 100644 fs/f2fs/namei.c
create mode 100644 fs/f2fs/node.c
create mode 100644 fs/f2fs/node.h
create mode 100644 fs/f2fs/recovery.c
create mode 100644 fs/f2fs/segment.c
create mode 100644 fs/f2fs/segment.h
create mode 100644 fs/f2fs/super.c
create mode 100644 fs/f2fs/xattr.c
create mode 100644 fs/f2fs/xattr.h
create mode 100644 include/linux/f2fs_fs.h

--
1.7.9.5




---
Jaegeuk Kim
Samsung



2012-10-06 13:54:12

by Viacheslav Dubeyko

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

Hi Jaegeuk,

> From: 김재극 <[email protected]>
> To: [email protected], 'Theodore Ts'o' <[email protected]>, [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
> Date: Fri, 05 Oct 2012 20:55:07 +0900
>
> This is a new patch set for the f2fs file system.
>
> What is F2FS?
> =============
>
> NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
> been widely being used for ranging from mobile to server systems. Since they are
> known to have different characteristics from the conventional rotational disks,
> a file system, an upper layer to the storage device, should adapt to the changes
> from the sketch.
>
> F2FS is a new file system carefully designed for the NAND flash memory-based storage
> devices. We chose a log structure file system approach, but we tried to adapt it
> to the new form of storage. Also we remedy some known issues of the very old log
> structured file system, such as snowball effect of wandering tree and high cleaning
> overhead.
>
> Because a NAND-based storage device shows different characteristics according to
> its internal geometry or flash memory management scheme aka FTL, we add various
> parameters not only for configuring on-disk layout, but also for selecting allocation
> and cleaning algorithms.
>

What about F2FS performance? Could you share benchmarking results of the new file system?

It is very interesting the case of aged file system. How is GC's implementation efficient? Could you share benchmarking results for the very aged file system state?

With the best regards,
Vyacheslav Dubeyko.

> Patch set
> =========
>
> The patch #1 adds a document to Documentation/filesystems/.
> The patch #2 adds a header file of on-disk layout to include/linux/.
> The patches #3-#15 adds f2fs source files to fs/f2fs/.
> The Last patch, patch #16, updates Makefile and Kconfig.
>
> mkfs.f2fs
> =========
>
> The file system formatting tool, "mkfs.f2fs", is available from the following
> download page:
> http://sourceforge.net/projects/f2fs-tools/
>
>
> Usage
> =====
>
> If you'd like to experience f2fs, simply:
> # mkfs.f2fs /dev/sdb1
> # mount -t f2fs /dev/sdb1 /mnt/f2fs
>
> Short log
> =========
>
> Jaegeuk Kim (16):
> f2fs: add document
> f2fs: add on-disk layout
> f2fs: add superblock and major in-memory structure
> f2fs: add super block operations
> f2fs: add checkpoint operations
> f2fs: add node operations
> f2fs: add segment operations
> f2fs: add file operations
> f2fs: add address space operations for data
> f2fs: add core inode operations
> f2fs: add inode operations for special inodes
> f2fs: add core directory operations
> f2fs: add xattr and acl functionalities
> f2fs: add garbage collection functions
> f2fs: add recovery routines for roll-forward
> f2fs: update Kconfig and Makefile
>
> Documentation/filesystems/00-INDEX | 2 +
> Documentation/filesystems/f2fs.txt | 314 +++++++
> fs/Kconfig | 1 +
> fs/Makefile | 1 +
> fs/f2fs/Kconfig | 55 ++
> fs/f2fs/Makefile | 6 +
> fs/f2fs/acl.c | 402 ++++++++
> fs/f2fs/acl.h | 57 ++
> fs/f2fs/checkpoint.c | 791 ++++++++++++++++
> fs/f2fs/data.c | 700 ++++++++++++++
> fs/f2fs/dir.c | 657 +++++++++++++
> fs/f2fs/f2fs.h | 981 ++++++++++++++++++++
> fs/f2fs/file.c | 643 +++++++++++++
> fs/f2fs/gc.c | 1140 +++++++++++++++++++++++
> fs/f2fs/gc.h | 203 +++++
> fs/f2fs/hash.c | 98 ++
> fs/f2fs/inode.c | 258 ++++++
> fs/f2fs/namei.c | 549 +++++++++++
> fs/f2fs/node.c | 1773 ++++++++++++++++++++++++++++++++++++
> fs/f2fs/node.h | 331 +++++++
> fs/f2fs/recovery.c | 372 ++++++++
> fs/f2fs/segment.c | 1755 +++++++++++++++++++++++++++++++++++
> fs/f2fs/segment.h | 627 +++++++++++++
> fs/f2fs/super.c | 550 +++++++++++
> fs/f2fs/xattr.c | 387 ++++++++
> fs/f2fs/xattr.h | 142 +++
> include/linux/f2fs_fs.h | 359 ++++++++
> 27 files changed, 13154 insertions(+)
> create mode 100644 Documentation/filesystems/f2fs.txt
> create mode 100644 fs/f2fs/Kconfig
> create mode 100644 fs/f2fs/Makefile
> create mode 100644 fs/f2fs/acl.c
> create mode 100644 fs/f2fs/acl.h
> create mode 100644 fs/f2fs/checkpoint.c
> create mode 100644 fs/f2fs/data.c
> create mode 100644 fs/f2fs/dir.c
> create mode 100644 fs/f2fs/f2fs.h
> create mode 100644 fs/f2fs/file.c
> create mode 100644 fs/f2fs/gc.c
> create mode 100644 fs/f2fs/gc.h
> create mode 100644 fs/f2fs/hash.c
> create mode 100644 fs/f2fs/inode.c
> create mode 100644 fs/f2fs/namei.c
> create mode 100644 fs/f2fs/node.c
> create mode 100644 fs/f2fs/node.h
> create mode 100644 fs/f2fs/recovery.c
> create mode 100644 fs/f2fs/segment.c
> create mode 100644 fs/f2fs/segment.h
> create mode 100644 fs/f2fs/super.c
> create mode 100644 fs/f2fs/xattr.c
> create mode 100644 fs/f2fs/xattr.h
> create mode 100644 include/linux/f2fs_fs.h
>
> --
> 1.7.9.5
>
>
>
>
> ---
> Jaegeuk Kim
> Samsung
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at
> http://vger.kernel.org/majordomo-info.html
>
> Please read the FAQ at http://www.tux.org/lkml/
>

2012-10-06 20:06:19

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
> Hi Jaegeuk,

Hi.
We know each other, right? :)

>
> > From: 김재극 <[email protected]>
> > To: [email protected], 'Theodore Ts'o' <[email protected]>, [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
> > Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > Date: Fri, 05 Oct 2012 20:55:07 +0900
> >
> > This is a new patch set for the f2fs file system.
> >
> > What is F2FS?
> > =============
> >
> > NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
> > been widely being used for ranging from mobile to server systems. Since they are
> > known to have different characteristics from the conventional rotational disks,
> > a file system, an upper layer to the storage device, should adapt to the changes
> > from the sketch.
> >
> > F2FS is a new file system carefully designed for the NAND flash memory-based storage
> > devices. We chose a log structure file system approach, but we tried to adapt it
> > to the new form of storage. Also we remedy some known issues of the very old log
> > structured file system, such as snowball effect of wandering tree and high cleaning
> > overhead.
> >
> > Because a NAND-based storage device shows different characteristics according to
> > its internal geometry or flash memory management scheme aka FTL, we add various
> > parameters not only for configuring on-disk layout, but also for selecting allocation
> > and cleaning algorithms.
> >
>
> What about F2FS performance? Could you share benchmarking results of the new file system?
>
> It is very interesting the case of aged file system. How is GC's implementation efficient? Could you share benchmarking results for the very aged file system state?
>

Although I have benchmark results, currently I'd like to see the results
measured by community as a black-box. As you know, the results are very
dependent on the workloads and parameters, so I think it would be better
to see other results for a while.
Thanks,

> With the best regards,
> Vyacheslav Dubeyko.
>
> > Patch set
> > =========
> >
> > The patch #1 adds a document to Documentation/filesystems/.
> > The patch #2 adds a header file of on-disk layout to include/linux/.
> > The patches #3-#15 adds f2fs source files to fs/f2fs/.
> > The Last patch, patch #16, updates Makefile and Kconfig.
> >
> > mkfs.f2fs
> > =========
> >
> > The file system formatting tool, "mkfs.f2fs", is available from the following
> > download page:
> > http://sourceforge.net/projects/f2fs-tools/
> >
> >
> > Usage
> > =====
> >
> > If you'd like to experience f2fs, simply:
> > # mkfs.f2fs /dev/sdb1
> > # mount -t f2fs /dev/sdb1 /mnt/f2fs
> >
> > Short log
> > =========
> >
> > Jaegeuk Kim (16):
> > f2fs: add document
> > f2fs: add on-disk layout
> > f2fs: add superblock and major in-memory structure
> > f2fs: add super block operations
> > f2fs: add checkpoint operations
> > f2fs: add node operations
> > f2fs: add segment operations
> > f2fs: add file operations
> > f2fs: add address space operations for data
> > f2fs: add core inode operations
> > f2fs: add inode operations for special inodes
> > f2fs: add core directory operations
> > f2fs: add xattr and acl functionalities
> > f2fs: add garbage collection functions
> > f2fs: add recovery routines for roll-forward
> > f2fs: update Kconfig and Makefile
> >
> > Documentation/filesystems/00-INDEX | 2 +
> > Documentation/filesystems/f2fs.txt | 314 +++++++
> > fs/Kconfig | 1 +
> > fs/Makefile | 1 +
> > fs/f2fs/Kconfig | 55 ++
> > fs/f2fs/Makefile | 6 +
> > fs/f2fs/acl.c | 402 ++++++++
> > fs/f2fs/acl.h | 57 ++
> > fs/f2fs/checkpoint.c | 791 ++++++++++++++++
> > fs/f2fs/data.c | 700 ++++++++++++++
> > fs/f2fs/dir.c | 657 +++++++++++++
> > fs/f2fs/f2fs.h | 981 ++++++++++++++++++++
> > fs/f2fs/file.c | 643 +++++++++++++
> > fs/f2fs/gc.c | 1140 +++++++++++++++++++++++
> > fs/f2fs/gc.h | 203 +++++
> > fs/f2fs/hash.c | 98 ++
> > fs/f2fs/inode.c | 258 ++++++
> > fs/f2fs/namei.c | 549 +++++++++++
> > fs/f2fs/node.c | 1773 ++++++++++++++++++++++++++++++++++++
> > fs/f2fs/node.h | 331 +++++++
> > fs/f2fs/recovery.c | 372 ++++++++
> > fs/f2fs/segment.c | 1755 +++++++++++++++++++++++++++++++++++
> > fs/f2fs/segment.h | 627 +++++++++++++
> > fs/f2fs/super.c | 550 +++++++++++
> > fs/f2fs/xattr.c | 387 ++++++++
> > fs/f2fs/xattr.h | 142 +++
> > include/linux/f2fs_fs.h | 359 ++++++++
> > 27 files changed, 13154 insertions(+)
> > create mode 100644 Documentation/filesystems/f2fs.txt
> > create mode 100644 fs/f2fs/Kconfig
> > create mode 100644 fs/f2fs/Makefile
> > create mode 100644 fs/f2fs/acl.c
> > create mode 100644 fs/f2fs/acl.h
> > create mode 100644 fs/f2fs/checkpoint.c
> > create mode 100644 fs/f2fs/data.c
> > create mode 100644 fs/f2fs/dir.c
> > create mode 100644 fs/f2fs/f2fs.h
> > create mode 100644 fs/f2fs/file.c
> > create mode 100644 fs/f2fs/gc.c
> > create mode 100644 fs/f2fs/gc.h
> > create mode 100644 fs/f2fs/hash.c
> > create mode 100644 fs/f2fs/inode.c
> > create mode 100644 fs/f2fs/namei.c
> > create mode 100644 fs/f2fs/node.c
> > create mode 100644 fs/f2fs/node.h
> > create mode 100644 fs/f2fs/recovery.c
> > create mode 100644 fs/f2fs/segment.c
> > create mode 100644 fs/f2fs/segment.h
> > create mode 100644 fs/f2fs/super.c
> > create mode 100644 fs/f2fs/xattr.c
> > create mode 100644 fs/f2fs/xattr.h
> > create mode 100644 include/linux/f2fs_fs.h
> >
> > --
> > 1.7.9.5
> >
> >
> >
> >
> > ---
> > Jaegeuk Kim
> > Samsung
> >
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
> >
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Jaegeuk Kim
Samsung

2012-10-07 07:16:14

by Marco Stornelli

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
>> Hi Jaegeuk,
>
> Hi.
> We know each other, right? :)
>
>>
>>> From: 김재극 <[email protected]>
>>> To: [email protected], 'Theodore Ts'o' <[email protected]>, [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
>>>
>>> This is a new patch set for the f2fs file system.
>>>
>>> What is F2FS?
>>> =============
>>>
>>> NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
>>> been widely being used for ranging from mobile to server systems. Since they are
>>> known to have different characteristics from the conventional rotational disks,
>>> a file system, an upper layer to the storage device, should adapt to the changes
>>> from the sketch.
>>>
>>> F2FS is a new file system carefully designed for the NAND flash memory-based storage
>>> devices. We chose a log structure file system approach, but we tried to adapt it
>>> to the new form of storage. Also we remedy some known issues of the very old log
>>> structured file system, such as snowball effect of wandering tree and high cleaning
>>> overhead.
>>>
>>> Because a NAND-based storage device shows different characteristics according to
>>> its internal geometry or flash memory management scheme aka FTL, we add various
>>> parameters not only for configuring on-disk layout, but also for selecting allocation
>>> and cleaning algorithms.
>>>
>>
>> What about F2FS performance? Could you share benchmarking results of the new file system?
>>
>> It is very interesting the case of aged file system. How is GC's implementation efficient? Could you share benchmarking results for the very aged file system state?
>>
>
> Although I have benchmark results, currently I'd like to see the results
> measured by community as a black-box. As you know, the results are very
> dependent on the workloads and parameters, so I think it would be better
> to see other results for a while.
> Thanks,
>

1) Actually it's a strange approach. If you have got any results you
should share them with the community explaining how (the workload, hw
and so on) your benchmark works and the specific condition. I really
don't like the approach "I've got the results but I don't say anything,
if you want a number, do it yourself".
2) For a new filesystem you should send the patches to linux-fsdevel.
3) It's not clear the pros/cons of your filesystem, can you share with
us the main differences with the current fs already in mainline? Or is
it a company secret?

Marco

2012-10-07 09:31:42

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

> -----Original Message-----
> From: Marco Stornelli [mailto:[email protected]]
> Sent: Sunday, October 07, 2012 4:10 PM
> To: Jaegeuk Kim
> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>
> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
> > 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
> >> Hi Jaegeuk,
> >
> > Hi.
> > We know each other, right? :)
> >
> >>
> >>> From: 김재극 <[email protected]>
> >>> To: [email protected], 'Theodore Ts'o' <[email protected]>,
> [email protected], [email protected], [email protected], [email protected],
> [email protected], [email protected]
> >>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >>> Date: Fri, 05 Oct 2012 20:55:07 +0900
> >>>
> >>> This is a new patch set for the f2fs file system.
> >>>
> >>> What is F2FS?
> >>> =============
> >>>
> >>> NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
> >>> been widely being used for ranging from mobile to server systems. Since they are
> >>> known to have different characteristics from the conventional rotational disks,
> >>> a file system, an upper layer to the storage device, should adapt to the changes
> >>> from the sketch.
> >>>
> >>> F2FS is a new file system carefully designed for the NAND flash memory-based storage
> >>> devices. We chose a log structure file system approach, but we tried to adapt it
> >>> to the new form of storage. Also we remedy some known issues of the very old log
> >>> structured file system, such as snowball effect of wandering tree and high cleaning
> >>> overhead.
> >>>
> >>> Because a NAND-based storage device shows different characteristics according to
> >>> its internal geometry or flash memory management scheme aka FTL, we add various
> >>> parameters not only for configuring on-disk layout, but also for selecting allocation
> >>> and cleaning algorithms.
> >>>
> >>
> >> What about F2FS performance? Could you share benchmarking results of the new file system?
> >>
> >> It is very interesting the case of aged file system. How is GC's implementation efficient? Could
> you share benchmarking results for the very aged file system state?
> >>
> >
> > Although I have benchmark results, currently I'd like to see the results
> > measured by community as a black-box. As you know, the results are very
> > dependent on the workloads and parameters, so I think it would be better
> > to see other results for a while.
> > Thanks,
> >
>
> 1) Actually it's a strange approach. If you have got any results you
> should share them with the community explaining how (the workload, hw
> and so on) your benchmark works and the specific condition. I really
> don't like the approach "I've got the results but I don't say anything,
> if you want a number, do it yourself".

It's definitely right, and I meant *for a while*.
I just wanted to avoid arguing with how to age file system in this time.
Before then, I share the primitive results as follows.

1. iozone in Panda board
- ARM A9
- DRAM : 1GB
- Kernel: Linux 3.3
- Partition: 12GB (64GB Samsung eMMC)
- Tested on 2GB file

seq. read, seq. write, rand. read, rand. write
- ext4: 30.753 17.066 5.06 4.15
- f2fs: 30.71 16.906 5.073 15.204

2. iozone in Galaxy Nexus
- DRAM : 1GB
- Android 4.0.4_r1.2
- Kernel omap 3.0.8
- Partition: /data, 12GB
- Tested on 2GB file

seq. read, seq. write, rand. read, rand. write
- ext4: 29.88 12.83 11.43 0.56
- f2fs: 29.70 13.34 10.79 12.82

Due to the company secret, I expect to show other results after presenting f2fs at korea linux forum.

> 2) For a new filesystem you should send the patches to linux-fsdevel.

Yes, that was totally my mistake.

> 3) It's not clear the pros/cons of your filesystem, can you share with
> us the main differences with the current fs already in mainline? Or is
> it a company secret?

After forum, I can share the slides, and I hope they will be useful to you.

Instead, let me summarize at a glance compared with other file systems.
Here are several log-structured file systems.
Note that, F2FS operates on top of block device with consideration on the FTL behavior.
So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are designed for raw NAND flash.
LogFS is initially designed for raw NAND flash, but expanded to block device.
But, I don't know whether it is stable or not.
NILFS2 is one of major log-structured file systems, which supports multiple snap-shots.
IMO, that feature is quite promising and important to users, but it may degrade the performance.
There is a trade-off between functionalities and performance.
F2FS chose high performance without any further fancy functionalities.

Maybe or obviously it is possible to optimize ext4 or btrfs to flash storages.
IMHO, however, they are originally designed for HDDs, so that it may or may not suffer from fundamental designs.
I don't know, but why not designing a new file system for flash storages as a counterpart?

>
> Marco

---
Jaegeuk Kim
Samsung

2012-10-07 10:15:40

by Viacheslav Dubeyko

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system


Hi,

On Oct 7, 2012, at 12:06 AM, Jaegeuk Kim wrote:

> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
>> Hi Jaegeuk,
>
> Hi.
> We know each other, right? :)
>

Yes, you are correct. :-)

>>
>>> From: 김재극 <[email protected]>
>>> To: [email protected], 'Theodore Ts'o' <[email protected]>, [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
>>>
>>> This is a new patch set for the f2fs file system.
>>>
>>> What is F2FS?
>>> =============
>>>
>>> NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
>>> been widely being used for ranging from mobile to server systems. Since they are
>>> known to have different characteristics from the conventional rotational disks,
>>> a file system, an upper layer to the storage device, should adapt to the changes
>>> from the sketch.
>>>
>>> F2FS is a new file system carefully designed for the NAND flash memory-based storage
>>> devices. We chose a log structure file system approach, but we tried to adapt it
>>> to the new form of storage. Also we remedy some known issues of the very old log
>>> structured file system, such as snowball effect of wandering tree and high cleaning
>>> overhead.
>>>
>>> Because a NAND-based storage device shows different characteristics according to
>>> its internal geometry or flash memory management scheme aka FTL, we add various
>>> parameters not only for configuring on-disk layout, but also for selecting allocation
>>> and cleaning algorithms.
>>>
>>
>> What about F2FS performance? Could you share benchmarking results of the new file system?
>>
>> It is very interesting the case of aged file system. How is GC's implementation efficient? Could you share benchmarking results for the very aged file system state?
>>
>
> Although I have benchmark results, currently I'd like to see the results
> measured by community as a black-box. As you know, the results are very
> dependent on the workloads and parameters, so I think it would be better
> to see other results for a while.
> Thanks,

It is a good strategy. But it exists known bottlenecks and, maybe, it makes sense to begin discussion in the community.

With the best regards,
Vyacheslav Dubeyko.

>
>> With the best regards,
>> Vyacheslav Dubeyko.
>>
>>> Patch set
>>> =========
>>>
>>> The patch #1 adds a document to Documentation/filesystems/.
>>> The patch #2 adds a header file of on-disk layout to include/linux/.
>>> The patches #3-#15 adds f2fs source files to fs/f2fs/.
>>> The Last patch, patch #16, updates Makefile and Kconfig.
>>>
>>> mkfs.f2fs
>>> =========
>>>
>>> The file system formatting tool, "mkfs.f2fs", is available from the following
>>> download page:
>>> http://sourceforge.net/projects/f2fs-tools/
>>>
>>>
>>> Usage
>>> =====
>>>
>>> If you'd like to experience f2fs, simply:
>>> # mkfs.f2fs /dev/sdb1
>>> # mount -t f2fs /dev/sdb1 /mnt/f2fs
>>>
>>> Short log
>>> =========
>>>
>>> Jaegeuk Kim (16):
>>> f2fs: add document
>>> f2fs: add on-disk layout
>>> f2fs: add superblock and major in-memory structure
>>> f2fs: add super block operations
>>> f2fs: add checkpoint operations
>>> f2fs: add node operations
>>> f2fs: add segment operations
>>> f2fs: add file operations
>>> f2fs: add address space operations for data
>>> f2fs: add core inode operations
>>> f2fs: add inode operations for special inodes
>>> f2fs: add core directory operations
>>> f2fs: add xattr and acl functionalities
>>> f2fs: add garbage collection functions
>>> f2fs: add recovery routines for roll-forward
>>> f2fs: update Kconfig and Makefile
>>>
>>> Documentation/filesystems/00-INDEX | 2 +
>>> Documentation/filesystems/f2fs.txt | 314 +++++++
>>> fs/Kconfig | 1 +
>>> fs/Makefile | 1 +
>>> fs/f2fs/Kconfig | 55 ++
>>> fs/f2fs/Makefile | 6 +
>>> fs/f2fs/acl.c | 402 ++++++++
>>> fs/f2fs/acl.h | 57 ++
>>> fs/f2fs/checkpoint.c | 791 ++++++++++++++++
>>> fs/f2fs/data.c | 700 ++++++++++++++
>>> fs/f2fs/dir.c | 657 +++++++++++++
>>> fs/f2fs/f2fs.h | 981 ++++++++++++++++++++
>>> fs/f2fs/file.c | 643 +++++++++++++
>>> fs/f2fs/gc.c | 1140 +++++++++++++++++++++++
>>> fs/f2fs/gc.h | 203 +++++
>>> fs/f2fs/hash.c | 98 ++
>>> fs/f2fs/inode.c | 258 ++++++
>>> fs/f2fs/namei.c | 549 +++++++++++
>>> fs/f2fs/node.c | 1773 ++++++++++++++++++++++++++++++++++++
>>> fs/f2fs/node.h | 331 +++++++
>>> fs/f2fs/recovery.c | 372 ++++++++
>>> fs/f2fs/segment.c | 1755 +++++++++++++++++++++++++++++++++++
>>> fs/f2fs/segment.h | 627 +++++++++++++
>>> fs/f2fs/super.c | 550 +++++++++++
>>> fs/f2fs/xattr.c | 387 ++++++++
>>> fs/f2fs/xattr.h | 142 +++
>>> include/linux/f2fs_fs.h | 359 ++++++++
>>> 27 files changed, 13154 insertions(+)
>>> create mode 100644 Documentation/filesystems/f2fs.txt
>>> create mode 100644 fs/f2fs/Kconfig
>>> create mode 100644 fs/f2fs/Makefile
>>> create mode 100644 fs/f2fs/acl.c
>>> create mode 100644 fs/f2fs/acl.h
>>> create mode 100644 fs/f2fs/checkpoint.c
>>> create mode 100644 fs/f2fs/data.c
>>> create mode 100644 fs/f2fs/dir.c
>>> create mode 100644 fs/f2fs/f2fs.h
>>> create mode 100644 fs/f2fs/file.c
>>> create mode 100644 fs/f2fs/gc.c
>>> create mode 100644 fs/f2fs/gc.h
>>> create mode 100644 fs/f2fs/hash.c
>>> create mode 100644 fs/f2fs/inode.c
>>> create mode 100644 fs/f2fs/namei.c
>>> create mode 100644 fs/f2fs/node.c
>>> create mode 100644 fs/f2fs/node.h
>>> create mode 100644 fs/f2fs/recovery.c
>>> create mode 100644 fs/f2fs/segment.c
>>> create mode 100644 fs/f2fs/segment.h
>>> create mode 100644 fs/f2fs/super.c
>>> create mode 100644 fs/f2fs/xattr.c
>>> create mode 100644 fs/f2fs/xattr.h
>>> create mode 100644 include/linux/f2fs_fs.h
>>>
>>> --
>>> 1.7.9.5
>>>
>>>
>>>
>>>
>>> ---
>>> Jaegeuk Kim
>>> Samsung
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to [email protected]
>>> More majordomo info at
>>> http://vger.kernel.org/majordomo-info.html
>>>
>>> Please read the FAQ at http://www.tux.org/lkml/
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
> --
> Jaegeuk Kim
> Samsung
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-10-07 12:08:48

by Viacheslav Dubeyko

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

Hi,

On Oct 7, 2012, at 1:31 PM, Jaegeuk Kim wrote:

>> -----Original Message-----
>> From: Marco Stornelli [mailto:[email protected]]
>> Sent: Sunday, October 07, 2012 4:10 PM
>> To: Jaegeuk Kim
>> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]
>> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>>
>> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
>>> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
>>>> Hi Jaegeuk,
>>>
>>> Hi.
>>> We know each other, right? :)
>>>
>>>>
>>>>> From: 김재극 <[email protected]>
>>>>> To: [email protected], 'Theodore Ts'o' <[email protected]>,
>> [email protected], [email protected], [email protected], [email protected],
>> [email protected], [email protected]
>>>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
>>>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
>>>>>
>>>>> This is a new patch set for the f2fs file system.
>>>>>
>>>>> What is F2FS?
>>>>> =============
>>>>>
>>>>> NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
>>>>> been widely being used for ranging from mobile to server systems. Since they are
>>>>> known to have different characteristics from the conventional rotational disks,
>>>>> a file system, an upper layer to the storage device, should adapt to the changes
>>>>> from the sketch.
>>>>>
>>>>> F2FS is a new file system carefully designed for the NAND flash memory-based storage
>>>>> devices. We chose a log structure file system approach, but we tried to adapt it
>>>>> to the new form of storage. Also we remedy some known issues of the very old log
>>>>> structured file system, such as snowball effect of wandering tree and high cleaning
>>>>> overhead.
>>>>>
>>>>> Because a NAND-based storage device shows different characteristics according to
>>>>> its internal geometry or flash memory management scheme aka FTL, we add various
>>>>> parameters not only for configuring on-disk layout, but also for selecting allocation
>>>>> and cleaning algorithms.
>>>>>
>>>>
>>>> What about F2FS performance? Could you share benchmarking results of the new file system?
>>>>
>>>> It is very interesting the case of aged file system. How is GC's implementation efficient? Could
>> you share benchmarking results for the very aged file system state?
>>>>
>>>
>>> Although I have benchmark results, currently I'd like to see the results
>>> measured by community as a black-box. As you know, the results are very
>>> dependent on the workloads and parameters, so I think it would be better
>>> to see other results for a while.
>>> Thanks,
>>>
>>
>> 1) Actually it's a strange approach. If you have got any results you
>> should share them with the community explaining how (the workload, hw
>> and so on) your benchmark works and the specific condition. I really
>> don't like the approach "I've got the results but I don't say anything,
>> if you want a number, do it yourself".
>
> It's definitely right, and I meant *for a while*.
> I just wanted to avoid arguing with how to age file system in this time.
> Before then, I share the primitive results as follows.
>
> 1. iozone in Panda board
> - ARM A9
> - DRAM : 1GB
> - Kernel: Linux 3.3
> - Partition: 12GB (64GB Samsung eMMC)
> - Tested on 2GB file
>
> seq. read, seq. write, rand. read, rand. write
> - ext4: 30.753 17.066 5.06 4.15
> - f2fs: 30.71 16.906 5.073 15.204
>
> 2. iozone in Galaxy Nexus
> - DRAM : 1GB
> - Android 4.0.4_r1.2
> - Kernel omap 3.0.8
> - Partition: /data, 12GB
> - Tested on 2GB file
>
> seq. read, seq. write, rand. read, rand. write
> - ext4: 29.88 12.83 11.43 0.56
> - f2fs: 29.70 13.34 10.79 12.82
>


This is results for non-aged filesystem state. Am I correct?


> Due to the company secret, I expect to show other results after presenting f2fs at korea linux forum.
>
>> 2) For a new filesystem you should send the patches to linux-fsdevel.
>
> Yes, that was totally my mistake.
>
>> 3) It's not clear the pros/cons of your filesystem, can you share with
>> us the main differences with the current fs already in mainline? Or is
>> it a company secret?
>
> After forum, I can share the slides, and I hope they will be useful to you.
>
> Instead, let me summarize at a glance compared with other file systems.
> Here are several log-structured file systems.
> Note that, F2FS operates on top of block device with consideration on the FTL behavior.
> So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are designed for raw NAND flash.
> LogFS is initially designed for raw NAND flash, but expanded to block device.
> But, I don't know whether it is stable or not.
> NILFS2 is one of major log-structured file systems, which supports multiple snap-shots.
> IMO, that feature is quite promising and important to users, but it may degrade the performance.
> There is a trade-off between functionalities and performance.
> F2FS chose high performance without any further fancy functionalities.
>

Performance is a good goal. But fault-tolerance is also very important point. Filesystems are used by users, so, it is very important to guarantee reliability of data keeping. Degradation of performance by means of snapshots is arguable point. Snapshots can solve the problem not only some unpredictable environmental issues but also user's erroneous behavior.

As I understand, it is not possible to have a perfect performance in all possible workloads. Could you point out what workloads are the best way of F2FS using?

> Maybe or obviously it is possible to optimize ext4 or btrfs to flash storages.
> IMHO, however, they are originally designed for HDDs, so that it may or may not suffer from fundamental designs.
> I don't know, but why not designing a new file system for flash storages as a counterpart?
>

Yes, it is possible. But F2FS is not flash oriented filesystem as JFFS2, YAFFS2, UBIFS but block-oriented filesystem. So, F2FS design is restricted by block-layer's opportunities in the using of flash storages' peculiarities. Could you point out key points of F2FS design that makes this design fundamentally unique?

With the best regards,
Vyacheslav Dubeyko.


>>
>> Marco
>
> ---
> Jaegeuk Kim
> Samsung
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2012-10-08 08:25:25

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

> -----Original Message-----
> From: Vyacheslav Dubeyko [mailto:[email protected]]
> Sent: Sunday, October 07, 2012 9:09 PM
> To: Jaegeuk Kim
> Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>
> Hi,
>
> On Oct 7, 2012, at 1:31 PM, Jaegeuk Kim wrote:
>
> >> -----Original Message-----
> >> From: Marco Stornelli [mailto:[email protected]]
> >> Sent: Sunday, October 07, 2012 4:10 PM
> >> To: Jaegeuk Kim
> >> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro; [email protected]; [email protected];
> >> [email protected]; [email protected]; [email protected];
> [email protected];
> >> [email protected]
> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >>
> >> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
> >>> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
> >>>> Hi Jaegeuk,
> >>>
> >>> Hi.
> >>> We know each other, right? :)
> >>>
> >>>>
> >>>>> From: 김재극 <[email protected]>
> >>>>> To: [email protected], 'Theodore Ts'o' <[email protected]>,
> >> [email protected], [email protected], [email protected],
> [email protected],
> >> [email protected], [email protected]
> >>>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >>>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
> >>>>>
> >>>>> This is a new patch set for the f2fs file system.
> >>>>>
> >>>>> What is F2FS?
> >>>>> =============
> >>>>>
> >>>>> NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
> >>>>> been widely being used for ranging from mobile to server systems. Since they are
> >>>>> known to have different characteristics from the conventional rotational disks,
> >>>>> a file system, an upper layer to the storage device, should adapt to the changes
> >>>>> from the sketch.
> >>>>>
> >>>>> F2FS is a new file system carefully designed for the NAND flash memory-based storage
> >>>>> devices. We chose a log structure file system approach, but we tried to adapt it
> >>>>> to the new form of storage. Also we remedy some known issues of the very old log
> >>>>> structured file system, such as snowball effect of wandering tree and high cleaning
> >>>>> overhead.
> >>>>>
> >>>>> Because a NAND-based storage device shows different characteristics according to
> >>>>> its internal geometry or flash memory management scheme aka FTL, we add various
> >>>>> parameters not only for configuring on-disk layout, but also for selecting allocation
> >>>>> and cleaning algorithms.
> >>>>>
> >>>>
> >>>> What about F2FS performance? Could you share benchmarking results of the new file system?
> >>>>
> >>>> It is very interesting the case of aged file system. How is GC's implementation efficient? Could
> >> you share benchmarking results for the very aged file system state?
> >>>>
> >>>
> >>> Although I have benchmark results, currently I'd like to see the results
> >>> measured by community as a black-box. As you know, the results are very
> >>> dependent on the workloads and parameters, so I think it would be better
> >>> to see other results for a while.
> >>> Thanks,
> >>>
> >>
> >> 1) Actually it's a strange approach. If you have got any results you
> >> should share them with the community explaining how (the workload, hw
> >> and so on) your benchmark works and the specific condition. I really
> >> don't like the approach "I've got the results but I don't say anything,
> >> if you want a number, do it yourself".
> >
> > It's definitely right, and I meant *for a while*.
> > I just wanted to avoid arguing with how to age file system in this time.
> > Before then, I share the primitive results as follows.
> >
> > 1. iozone in Panda board
> > - ARM A9
> > - DRAM : 1GB
> > - Kernel: Linux 3.3
> > - Partition: 12GB (64GB Samsung eMMC)
> > - Tested on 2GB file
> >
> > seq. read, seq. write, rand. read, rand. write
> > - ext4: 30.753 17.066 5.06 4.15
> > - f2fs: 30.71 16.906 5.073 15.204
> >
> > 2. iozone in Galaxy Nexus
> > - DRAM : 1GB
> > - Android 4.0.4_r1.2
> > - Kernel omap 3.0.8
> > - Partition: /data, 12GB
> > - Tested on 2GB file
> >
> > seq. read, seq. write, rand. read, rand. write
> > - ext4: 29.88 12.83 11.43 0.56
> > - f2fs: 29.70 13.34 10.79 12.82
> >
>
>
> This is results for non-aged filesystem state. Am I correct?
>

Yes, right.

>
> > Due to the company secret, I expect to show other results after presenting f2fs at korea linux forum.
> >
> >> 2) For a new filesystem you should send the patches to linux-fsdevel.
> >
> > Yes, that was totally my mistake.
> >
> >> 3) It's not clear the pros/cons of your filesystem, can you share with
> >> us the main differences with the current fs already in mainline? Or is
> >> it a company secret?
> >
> > After forum, I can share the slides, and I hope they will be useful to you.
> >
> > Instead, let me summarize at a glance compared with other file systems.
> > Here are several log-structured file systems.
> > Note that, F2FS operates on top of block device with consideration on the FTL behavior.
> > So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are designed for raw NAND flash.
> > LogFS is initially designed for raw NAND flash, but expanded to block device.
> > But, I don't know whether it is stable or not.
> > NILFS2 is one of major log-structured file systems, which supports multiple snap-shots.
> > IMO, that feature is quite promising and important to users, but it may degrade the performance.
> > There is a trade-off between functionalities and performance.
> > F2FS chose high performance without any further fancy functionalities.
> >
>
> Performance is a good goal. But fault-tolerance is also very important point. Filesystems are used by
> users, so, it is very important to guarantee reliability of data keeping. Degradation of performance
> by means of snapshots is arguable point. Snapshots can solve the problem not only some unpredictable
> environmental issues but also user's erroneous behavior.
>

Yes, I agree. I concerned the multiple snapshot feature.
Of course, fault-tolerance is very important, and file system should support it as you know as power-off-recovery.
f2fs supports the recovery mechanism by adopting checkpoint similar to snapshot.
But, f2fs does not support multiple snapshots for user convenience.
I just focused on the performance, and absolutely, the multiple snapshot feature is also a good alternative approach.
That may be a trade-off.

> As I understand, it is not possible to have a perfect performance in all possible workloads. Could you
> point out what workloads are the best way of F2FS using?

Basically I think the following workloads will be good for F2FS.
- Many random writes : it's LFS nature
- Small writes with frequent fsync : f2fs is optimized to reduce the fsync overhead.

>
> > Maybe or obviously it is possible to optimize ext4 or btrfs to flash storages.
> > IMHO, however, they are originally designed for HDDs, so that it may or may not suffer from
> fundamental designs.
> > I don't know, but why not designing a new file system for flash storages as a counterpart?
> >
>
> Yes, it is possible. But F2FS is not flash oriented filesystem as JFFS2, YAFFS2, UBIFS but block-
> oriented filesystem. So, F2FS design is restricted by block-layer's opportunities in the using of
> flash storages' peculiarities. Could you point out key points of F2FS design that makes this design
> fundamentally unique?

As you can see the f2fs kernel document patch, I think one of the most important features is to align operating units between f2fs and ftl.
Specifically, f2fs has section and zone, which are cleaning unit and basic allocation unit respectively.
Through these configurable units in f2fs, I think f2fs is able to reduce the unnecessary operations done by FTL.
And, in order to avoid changing IO patterns by the block-layer, f2fs merges itself some bios likewise ext4.

>
> With the best regards,
> Vyacheslav Dubeyko.
>
>
> >>
> >> Marco
> >
> > ---
> > Jaegeuk Kim
> > Samsung
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/


---
Jaegeuk Kim
Samsung

2012-10-08 09:59:54

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

2012/10/8, Jaegeuk Kim <[email protected]>:
>> -----Original Message-----
>> From: Vyacheslav Dubeyko [mailto:[email protected]]
>> Sent: Sunday, October 07, 2012 9:09 PM
>> To: Jaegeuk Kim
>> Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected];
>> [email protected]; linux-
>> [email protected]; [email protected]; [email protected];
>> [email protected];
>> [email protected]
>> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>>
>> Hi,
>>
>> On Oct 7, 2012, at 1:31 PM, Jaegeuk Kim wrote:
>>
>> >> -----Original Message-----
>> >> From: Marco Stornelli [mailto:[email protected]]
>> >> Sent: Sunday, October 07, 2012 4:10 PM
>> >> To: Jaegeuk Kim
>> >> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro;
>> >> [email protected]; [email protected];
>> >> [email protected]; [email protected];
>> >> [email protected];
>> [email protected];
>> >> [email protected]
>> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>> >>
>> >> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
>> >>> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
>> >>>> Hi Jaegeuk,
>> >>>
>> >>> Hi.
>> >>> We know each other, right? :)
>> >>>
>> >>>>
>> >>>>> From: 김재극 <[email protected]>
>> >>>>> To: [email protected], 'Theodore Ts'o' <[email protected]>,
>> >> [email protected], [email protected],
>> >> [email protected],
>> [email protected],
>> >> [email protected], [email protected]
>> >>>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
>> >>>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
>> >>>>>
>> >>>>> This is a new patch set for the f2fs file system.
>> >>>>>
>> >>>>> What is F2FS?
>> >>>>> =============
>> >>>>>
>> >>>>> NAND flash memory-based storage devices, such as SSD, eMMC, and SD
>> >>>>> cards, have
>> >>>>> been widely being used for ranging from mobile to server systems.
>> >>>>> Since they are
>> >>>>> known to have different characteristics from the conventional
>> >>>>> rotational disks,
>> >>>>> a file system, an upper layer to the storage device, should adapt to
>> >>>>> the changes
>> >>>>> from the sketch.
>> >>>>>
>> >>>>> F2FS is a new file system carefully designed for the NAND flash
>> >>>>> memory-based storage
>> >>>>> devices. We chose a log structure file system approach, but we tried
>> >>>>> to adapt it
>> >>>>> to the new form of storage. Also we remedy some known issues of the
>> >>>>> very old log
>> >>>>> structured file system, such as snowball effect of wandering tree
>> >>>>> and high cleaning
>> >>>>> overhead.
>> >>>>>
>> >>>>> Because a NAND-based storage device shows different characteristics
>> >>>>> according to
>> >>>>> its internal geometry or flash memory management scheme aka FTL, we
>> >>>>> add various
>> >>>>> parameters not only for configuring on-disk layout, but also for
>> >>>>> selecting allocation
>> >>>>> and cleaning algorithms.
>> >>>>>
>> >>>>
>> >>>> What about F2FS performance? Could you share benchmarking results of
>> >>>> the new file system?
>> >>>>
>> >>>> It is very interesting the case of aged file system. How is GC's
>> >>>> implementation efficient? Could
>> >> you share benchmarking results for the very aged file system state?
>> >>>>
>> >>>
>> >>> Although I have benchmark results, currently I'd like to see the
>> >>> results
>> >>> measured by community as a black-box. As you know, the results are
>> >>> very
>> >>> dependent on the workloads and parameters, so I think it would be
>> >>> better
>> >>> to see other results for a while.
>> >>> Thanks,
>> >>>
>> >>
>> >> 1) Actually it's a strange approach. If you have got any results you
>> >> should share them with the community explaining how (the workload, hw
>> >> and so on) your benchmark works and the specific condition. I really
>> >> don't like the approach "I've got the results but I don't say
>> >> anything,
>> >> if you want a number, do it yourself".
>> >
>> > It's definitely right, and I meant *for a while*.
>> > I just wanted to avoid arguing with how to age file system in this
>> > time.
>> > Before then, I share the primitive results as follows.
>> >
>> > 1. iozone in Panda board
>> > - ARM A9
>> > - DRAM : 1GB
>> > - Kernel: Linux 3.3
>> > - Partition: 12GB (64GB Samsung eMMC)
>> > - Tested on 2GB file
>> >
>> > seq. read, seq. write, rand. read, rand. write
>> > - ext4: 30.753 17.066 5.06 4.15
>> > - f2fs: 30.71 16.906 5.073 15.204
>> >
>> > 2. iozone in Galaxy Nexus
>> > - DRAM : 1GB
>> > - Android 4.0.4_r1.2
>> > - Kernel omap 3.0.8
>> > - Partition: /data, 12GB
>> > - Tested on 2GB file
>> >
>> > seq. read, seq. write, rand. read, rand. write
>> > - ext4: 29.88 12.83 11.43 0.56
>> > - f2fs: 29.70 13.34 10.79 12.82
>> >
>>
>>
>> This is results for non-aged filesystem state. Am I correct?
>>
>
> Yes, right.
>
>>
>> > Due to the company secret, I expect to show other results after
>> > presenting f2fs at korea linux forum.
>> >
>> >> 2) For a new filesystem you should send the patches to linux-fsdevel.
>> >
>> > Yes, that was totally my mistake.
>> >
>> >> 3) It's not clear the pros/cons of your filesystem, can you share with
>> >> us the main differences with the current fs already in mainline? Or is
>> >> it a company secret?
>> >
>> > After forum, I can share the slides, and I hope they will be useful to
>> > you.
>> >
>> > Instead, let me summarize at a glance compared with other file systems.
>> > Here are several log-structured file systems.
>> > Note that, F2FS operates on top of block device with consideration on
>> > the FTL behavior.
>> > So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are designed
>> > for raw NAND flash.
>> > LogFS is initially designed for raw NAND flash, but expanded to block
>> > device.
>> > But, I don't know whether it is stable or not.
>> > NILFS2 is one of major log-structured file systems, which supports
>> > multiple snap-shots.
>> > IMO, that feature is quite promising and important to users, but it may
>> > degrade the performance.
>> > There is a trade-off between functionalities and performance.
>> > F2FS chose high performance without any further fancy functionalities.
>> >
>>
>> Performance is a good goal. But fault-tolerance is also very important
>> point. Filesystems are used by
>> users, so, it is very important to guarantee reliability of data keeping.
>> Degradation of performance
>> by means of snapshots is arguable point. Snapshots can solve the problem
>> not only some unpredictable
>> environmental issues but also user's erroneous behavior.
>>
>
> Yes, I agree. I concerned the multiple snapshot feature.
> Of course, fault-tolerance is very important, and file system should support
> it as you know as power-off-recovery.
> f2fs supports the recovery mechanism by adopting checkpoint similar to
> snapshot.
> But, f2fs does not support multiple snapshots for user convenience.
> I just focused on the performance, and absolutely, the multiple snapshot
> feature is also a good alternative approach.
> That may be a trade-off.
>
>> As I understand, it is not possible to have a perfect performance in all
>> possible workloads. Could you
>> point out what workloads are the best way of F2FS using?
>
> Basically I think the following workloads will be good for F2FS.
> - Many random writes : it's LFS nature
> - Small writes with frequent fsync : f2fs is optimized to reduce the fsync
> overhead.
>
>>
>> > Maybe or obviously it is possible to optimize ext4 or btrfs to flash
>> > storages.
>> > IMHO, however, they are originally designed for HDDs, so that it may or
>> > may not suffer from
>> fundamental designs.
>> > I don't know, but why not designing a new file system for flash storages
>> > as a counterpart?
>> >
>>
>> Yes, it is possible. But F2FS is not flash oriented filesystem as JFFS2,
>> YAFFS2, UBIFS but block-
>> oriented filesystem. So, F2FS design is restricted by block-layer's
>> opportunities in the using of
>> flash storages' peculiarities. Could you point out key points of F2FS
>> design that makes this design
>> fundamentally unique?
>
> As you can see the f2fs kernel document patch, I think one of the most
> important features is to align operating units between f2fs and ftl.
> Specifically, f2fs has section and zone, which are cleaning unit and basic
> allocation unit respectively.
> Through these configurable units in f2fs, I think f2fs is able to reduce the
> unnecessary operations done by FTL.
> And, in order to avoid changing IO patterns by the block-layer, f2fs merges
> itself some bios likewise ext4.
Hello.
The internal of eMMC and SSD is the blackbox from user side.
How does the normal user easily set operating units alignment(page
size and physical block size ?) between f2fs and ftl in storage device
?

Thanks.

>
>>
>> With the best regards,
>> Vyacheslav Dubeyko.
>>
>>
>> >>
>> >> Marco
>> >
>> > ---
>> > Jaegeuk Kim
>> > Samsung
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
>> > in
>> > the body of a message to [email protected]
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> > Please read the FAQ at http://www.tux.org/lkml/
>
>
> ---
> Jaegeuk Kim
> Samsung
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2012-10-08 10:52:09

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

> -----Original Message-----
> From: Namjae Jeon [mailto:[email protected]]
> Sent: Monday, October 08, 2012 7:00 PM
> To: Jaegeuk Kim
> Cc: Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>
> 2012/10/8, Jaegeuk Kim <[email protected]>:
> >> -----Original Message-----
> >> From: Vyacheslav Dubeyko [mailto:[email protected]]
> >> Sent: Sunday, October 07, 2012 9:09 PM
> >> To: Jaegeuk Kim
> >> Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected];
> >> [email protected]; linux-
> >> [email protected]; [email protected]; [email protected];
> >> [email protected];
> >> [email protected]
> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >>
> >> Hi,
> >>
> >> On Oct 7, 2012, at 1:31 PM, Jaegeuk Kim wrote:
> >>
> >> >> -----Original Message-----
> >> >> From: Marco Stornelli [mailto:[email protected]]
> >> >> Sent: Sunday, October 07, 2012 4:10 PM
> >> >> To: Jaegeuk Kim
> >> >> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro;
> >> >> [email protected]; [email protected];
> >> >> [email protected]; [email protected];
> >> >> [email protected];
> >> [email protected];
> >> >> [email protected]
> >> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >> >>
> >> >> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
> >> >>> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
> >> >>>> Hi Jaegeuk,
> >> >>>
> >> >>> Hi.
> >> >>> We know each other, right? :)
> >> >>>
> >> >>>>
> >> >>>>> From: 김재극 <[email protected]>
> >> >>>>> To: [email protected], 'Theodore Ts'o' <[email protected]>,
> >> >> [email protected], [email protected],
> >> >> [email protected],
> >> [email protected],
> >> >> [email protected], [email protected]
> >> >>>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >> >>>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
> >> >>>>>
> >> >>>>> This is a new patch set for the f2fs file system.
> >> >>>>>
> >> >>>>> What is F2FS?
> >> >>>>> =============
> >> >>>>>
> >> >>>>> NAND flash memory-based storage devices, such as SSD, eMMC, and SD
> >> >>>>> cards, have
> >> >>>>> been widely being used for ranging from mobile to server systems.
> >> >>>>> Since they are
> >> >>>>> known to have different characteristics from the conventional
> >> >>>>> rotational disks,
> >> >>>>> a file system, an upper layer to the storage device, should adapt to
> >> >>>>> the changes
> >> >>>>> from the sketch.
> >> >>>>>
> >> >>>>> F2FS is a new file system carefully designed for the NAND flash
> >> >>>>> memory-based storage
> >> >>>>> devices. We chose a log structure file system approach, but we tried
> >> >>>>> to adapt it
> >> >>>>> to the new form of storage. Also we remedy some known issues of the
> >> >>>>> very old log
> >> >>>>> structured file system, such as snowball effect of wandering tree
> >> >>>>> and high cleaning
> >> >>>>> overhead.
> >> >>>>>
> >> >>>>> Because a NAND-based storage device shows different characteristics
> >> >>>>> according to
> >> >>>>> its internal geometry or flash memory management scheme aka FTL, we
> >> >>>>> add various
> >> >>>>> parameters not only for configuring on-disk layout, but also for
> >> >>>>> selecting allocation
> >> >>>>> and cleaning algorithms.
> >> >>>>>
> >> >>>>
> >> >>>> What about F2FS performance? Could you share benchmarking results of
> >> >>>> the new file system?
> >> >>>>
> >> >>>> It is very interesting the case of aged file system. How is GC's
> >> >>>> implementation efficient? Could
> >> >> you share benchmarking results for the very aged file system state?
> >> >>>>
> >> >>>
> >> >>> Although I have benchmark results, currently I'd like to see the
> >> >>> results
> >> >>> measured by community as a black-box. As you know, the results are
> >> >>> very
> >> >>> dependent on the workloads and parameters, so I think it would be
> >> >>> better
> >> >>> to see other results for a while.
> >> >>> Thanks,
> >> >>>
> >> >>
> >> >> 1) Actually it's a strange approach. If you have got any results you
> >> >> should share them with the community explaining how (the workload, hw
> >> >> and so on) your benchmark works and the specific condition. I really
> >> >> don't like the approach "I've got the results but I don't say
> >> >> anything,
> >> >> if you want a number, do it yourself".
> >> >
> >> > It's definitely right, and I meant *for a while*.
> >> > I just wanted to avoid arguing with how to age file system in this
> >> > time.
> >> > Before then, I share the primitive results as follows.
> >> >
> >> > 1. iozone in Panda board
> >> > - ARM A9
> >> > - DRAM : 1GB
> >> > - Kernel: Linux 3.3
> >> > - Partition: 12GB (64GB Samsung eMMC)
> >> > - Tested on 2GB file
> >> >
> >> > seq. read, seq. write, rand. read, rand. write
> >> > - ext4: 30.753 17.066 5.06 4.15
> >> > - f2fs: 30.71 16.906 5.073 15.204
> >> >
> >> > 2. iozone in Galaxy Nexus
> >> > - DRAM : 1GB
> >> > - Android 4.0.4_r1.2
> >> > - Kernel omap 3.0.8
> >> > - Partition: /data, 12GB
> >> > - Tested on 2GB file
> >> >
> >> > seq. read, seq. write, rand. read, rand. write
> >> > - ext4: 29.88 12.83 11.43 0.56
> >> > - f2fs: 29.70 13.34 10.79 12.82
> >> >
> >>
> >>
> >> This is results for non-aged filesystem state. Am I correct?
> >>
> >
> > Yes, right.
> >
> >>
> >> > Due to the company secret, I expect to show other results after
> >> > presenting f2fs at korea linux forum.
> >> >
> >> >> 2) For a new filesystem you should send the patches to linux-fsdevel.
> >> >
> >> > Yes, that was totally my mistake.
> >> >
> >> >> 3) It's not clear the pros/cons of your filesystem, can you share with
> >> >> us the main differences with the current fs already in mainline? Or is
> >> >> it a company secret?
> >> >
> >> > After forum, I can share the slides, and I hope they will be useful to
> >> > you.
> >> >
> >> > Instead, let me summarize at a glance compared with other file systems.
> >> > Here are several log-structured file systems.
> >> > Note that, F2FS operates on top of block device with consideration on
> >> > the FTL behavior.
> >> > So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are designed
> >> > for raw NAND flash.
> >> > LogFS is initially designed for raw NAND flash, but expanded to block
> >> > device.
> >> > But, I don't know whether it is stable or not.
> >> > NILFS2 is one of major log-structured file systems, which supports
> >> > multiple snap-shots.
> >> > IMO, that feature is quite promising and important to users, but it may
> >> > degrade the performance.
> >> > There is a trade-off between functionalities and performance.
> >> > F2FS chose high performance without any further fancy functionalities.
> >> >
> >>
> >> Performance is a good goal. But fault-tolerance is also very important
> >> point. Filesystems are used by
> >> users, so, it is very important to guarantee reliability of data keeping.
> >> Degradation of performance
> >> by means of snapshots is arguable point. Snapshots can solve the problem
> >> not only some unpredictable
> >> environmental issues but also user's erroneous behavior.
> >>
> >
> > Yes, I agree. I concerned the multiple snapshot feature.
> > Of course, fault-tolerance is very important, and file system should support
> > it as you know as power-off-recovery.
> > f2fs supports the recovery mechanism by adopting checkpoint similar to
> > snapshot.
> > But, f2fs does not support multiple snapshots for user convenience.
> > I just focused on the performance, and absolutely, the multiple snapshot
> > feature is also a good alternative approach.
> > That may be a trade-off.
> >
> >> As I understand, it is not possible to have a perfect performance in all
> >> possible workloads. Could you
> >> point out what workloads are the best way of F2FS using?
> >
> > Basically I think the following workloads will be good for F2FS.
> > - Many random writes : it's LFS nature
> > - Small writes with frequent fsync : f2fs is optimized to reduce the fsync
> > overhead.
> >
> >>
> >> > Maybe or obviously it is possible to optimize ext4 or btrfs to flash
> >> > storages.
> >> > IMHO, however, they are originally designed for HDDs, so that it may or
> >> > may not suffer from
> >> fundamental designs.
> >> > I don't know, but why not designing a new file system for flash storages
> >> > as a counterpart?
> >> >
> >>
> >> Yes, it is possible. But F2FS is not flash oriented filesystem as JFFS2,
> >> YAFFS2, UBIFS but block-
> >> oriented filesystem. So, F2FS design is restricted by block-layer's
> >> opportunities in the using of
> >> flash storages' peculiarities. Could you point out key points of F2FS
> >> design that makes this design
> >> fundamentally unique?
> >
> > As you can see the f2fs kernel document patch, I think one of the most
> > important features is to align operating units between f2fs and ftl.
> > Specifically, f2fs has section and zone, which are cleaning unit and basic
> > allocation unit respectively.
> > Through these configurable units in f2fs, I think f2fs is able to reduce the
> > unnecessary operations done by FTL.
> > And, in order to avoid changing IO patterns by the block-layer, f2fs merges
> > itself some bios likewise ext4.
> Hello.
> The internal of eMMC and SSD is the blackbox from user side.
> How does the normal user easily set operating units alignment(page
> size and physical block size ?) between f2fs and ftl in storage device
> ?

I've known that some works have been tried to figure out the units by profiling the storage, AKA reverse engineering.
In most cases, the simplest way is to measure the latencies of consecutive writes and analyze their patterns.
As you mentioned, in practical, users will not want to do this, so maybe we need a tool to profile them to optimize f2fs.
In the current state, I think profiling is an another issue, and mkfs.f2fs had better include this work in the future.
But, IMO, from the viewpoint of performance, default configuration is quite enough now.

ps) f2fs doesn't care about the flash page size, but considers garbage collection unit.

>
> Thanks.
>
> >
> >>
> >> With the best regards,
> >> Vyacheslav Dubeyko.
> >>
> >>
> >> >>
> >> >> Marco
> >> >
> >> > ---
> >> > Jaegeuk Kim
> >> > Samsung
> >> >
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> >> > in
> >> > the body of a message to [email protected]
> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> > Please read the FAQ at http://www.tux.org/lkml/
> >
> >
> > ---
> > Jaegeuk Kim
> > Samsung
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >


---
Jaegeuk Kim
Samsung

2012-10-08 11:22:04

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

2012/10/8, Jaegeuk Kim <[email protected]>:
>> -----Original Message-----
>> From: Namjae Jeon [mailto:[email protected]]
>> Sent: Monday, October 08, 2012 7:00 PM
>> To: Jaegeuk Kim
>> Cc: Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro;
>> [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]
>> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>>
>> 2012/10/8, Jaegeuk Kim <[email protected]>:
>> >> -----Original Message-----
>> >> From: Vyacheslav Dubeyko [mailto:[email protected]]
>> >> Sent: Sunday, October 07, 2012 9:09 PM
>> >> To: Jaegeuk Kim
>> >> Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected];
>> >> [email protected]; linux-
>> >> [email protected]; [email protected]; [email protected];
>> >> [email protected];
>> >> [email protected]
>> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>> >>
>> >> Hi,
>> >>
>> >> On Oct 7, 2012, at 1:31 PM, Jaegeuk Kim wrote:
>> >>
>> >> >> -----Original Message-----
>> >> >> From: Marco Stornelli [mailto:[email protected]]
>> >> >> Sent: Sunday, October 07, 2012 4:10 PM
>> >> >> To: Jaegeuk Kim
>> >> >> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro;
>> >> >> [email protected]; [email protected];
>> >> >> [email protected]; [email protected];
>> >> >> [email protected];
>> >> [email protected];
>> >> >> [email protected]
>> >> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file
>> >> >> system
>> >> >>
>> >> >> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
>> >> >>> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
>> >> >>>> Hi Jaegeuk,
>> >> >>>
>> >> >>> Hi.
>> >> >>> We know each other, right? :)
>> >> >>>
>> >> >>>>
>> >> >>>>> From: 김재극 <[email protected]>
>> >> >>>>> To: [email protected], 'Theodore Ts'o' <[email protected]>,
>> >> >> [email protected], [email protected],
>> >> >> [email protected],
>> >> [email protected],
>> >> >> [email protected], [email protected]
>> >> >>>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file
>> >> >>>>> system
>> >> >>>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
>> >> >>>>>
>> >> >>>>> This is a new patch set for the f2fs file system.
>> >> >>>>>
>> >> >>>>> What is F2FS?
>> >> >>>>> =============
>> >> >>>>>
>> >> >>>>> NAND flash memory-based storage devices, such as SSD, eMMC, and
>> >> >>>>> SD
>> >> >>>>> cards, have
>> >> >>>>> been widely being used for ranging from mobile to server
>> >> >>>>> systems.
>> >> >>>>> Since they are
>> >> >>>>> known to have different characteristics from the conventional
>> >> >>>>> rotational disks,
>> >> >>>>> a file system, an upper layer to the storage device, should adapt
>> >> >>>>> to
>> >> >>>>> the changes
>> >> >>>>> from the sketch.
>> >> >>>>>
>> >> >>>>> F2FS is a new file system carefully designed for the NAND flash
>> >> >>>>> memory-based storage
>> >> >>>>> devices. We chose a log structure file system approach, but we
>> >> >>>>> tried
>> >> >>>>> to adapt it
>> >> >>>>> to the new form of storage. Also we remedy some known issues of
>> >> >>>>> the
>> >> >>>>> very old log
>> >> >>>>> structured file system, such as snowball effect of wandering
>> >> >>>>> tree
>> >> >>>>> and high cleaning
>> >> >>>>> overhead.
>> >> >>>>>
>> >> >>>>> Because a NAND-based storage device shows different
>> >> >>>>> characteristics
>> >> >>>>> according to
>> >> >>>>> its internal geometry or flash memory management scheme aka FTL,
>> >> >>>>> we
>> >> >>>>> add various
>> >> >>>>> parameters not only for configuring on-disk layout, but also for
>> >> >>>>> selecting allocation
>> >> >>>>> and cleaning algorithms.
>> >> >>>>>
>> >> >>>>
>> >> >>>> What about F2FS performance? Could you share benchmarking results
>> >> >>>> of
>> >> >>>> the new file system?
>> >> >>>>
>> >> >>>> It is very interesting the case of aged file system. How is GC's
>> >> >>>> implementation efficient? Could
>> >> >> you share benchmarking results for the very aged file system state?
>> >> >>>>
>> >> >>>
>> >> >>> Although I have benchmark results, currently I'd like to see the
>> >> >>> results
>> >> >>> measured by community as a black-box. As you know, the results are
>> >> >>> very
>> >> >>> dependent on the workloads and parameters, so I think it would be
>> >> >>> better
>> >> >>> to see other results for a while.
>> >> >>> Thanks,
>> >> >>>
>> >> >>
>> >> >> 1) Actually it's a strange approach. If you have got any results
>> >> >> you
>> >> >> should share them with the community explaining how (the workload,
>> >> >> hw
>> >> >> and so on) your benchmark works and the specific condition. I
>> >> >> really
>> >> >> don't like the approach "I've got the results but I don't say
>> >> >> anything,
>> >> >> if you want a number, do it yourself".
>> >> >
>> >> > It's definitely right, and I meant *for a while*.
>> >> > I just wanted to avoid arguing with how to age file system in this
>> >> > time.
>> >> > Before then, I share the primitive results as follows.
>> >> >
>> >> > 1. iozone in Panda board
>> >> > - ARM A9
>> >> > - DRAM : 1GB
>> >> > - Kernel: Linux 3.3
>> >> > - Partition: 12GB (64GB Samsung eMMC)
>> >> > - Tested on 2GB file
>> >> >
>> >> > seq. read, seq. write, rand. read, rand. write
>> >> > - ext4: 30.753 17.066 5.06 4.15
>> >> > - f2fs: 30.71 16.906 5.073 15.204
>> >> >
>> >> > 2. iozone in Galaxy Nexus
>> >> > - DRAM : 1GB
>> >> > - Android 4.0.4_r1.2
>> >> > - Kernel omap 3.0.8
>> >> > - Partition: /data, 12GB
>> >> > - Tested on 2GB file
>> >> >
>> >> > seq. read, seq. write, rand. read, rand. write
>> >> > - ext4: 29.88 12.83 11.43 0.56
>> >> > - f2fs: 29.70 13.34 10.79 12.82
>> >> >
>> >>
>> >>
>> >> This is results for non-aged filesystem state. Am I correct?
>> >>
>> >
>> > Yes, right.
>> >
>> >>
>> >> > Due to the company secret, I expect to show other results after
>> >> > presenting f2fs at korea linux forum.
>> >> >
>> >> >> 2) For a new filesystem you should send the patches to
>> >> >> linux-fsdevel.
>> >> >
>> >> > Yes, that was totally my mistake.
>> >> >
>> >> >> 3) It's not clear the pros/cons of your filesystem, can you share
>> >> >> with
>> >> >> us the main differences with the current fs already in mainline? Or
>> >> >> is
>> >> >> it a company secret?
>> >> >
>> >> > After forum, I can share the slides, and I hope they will be useful
>> >> > to
>> >> > you.
>> >> >
>> >> > Instead, let me summarize at a glance compared with other file
>> >> > systems.
>> >> > Here are several log-structured file systems.
>> >> > Note that, F2FS operates on top of block device with consideration
>> >> > on
>> >> > the FTL behavior.
>> >> > So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are
>> >> > designed
>> >> > for raw NAND flash.
>> >> > LogFS is initially designed for raw NAND flash, but expanded to
>> >> > block
>> >> > device.
>> >> > But, I don't know whether it is stable or not.
>> >> > NILFS2 is one of major log-structured file systems, which supports
>> >> > multiple snap-shots.
>> >> > IMO, that feature is quite promising and important to users, but it
>> >> > may
>> >> > degrade the performance.
>> >> > There is a trade-off between functionalities and performance.
>> >> > F2FS chose high performance without any further fancy
>> >> > functionalities.
>> >> >
>> >>
>> >> Performance is a good goal. But fault-tolerance is also very important
>> >> point. Filesystems are used by
>> >> users, so, it is very important to guarantee reliability of data
>> >> keeping.
>> >> Degradation of performance
>> >> by means of snapshots is arguable point. Snapshots can solve the
>> >> problem
>> >> not only some unpredictable
>> >> environmental issues but also user's erroneous behavior.
>> >>
>> >
>> > Yes, I agree. I concerned the multiple snapshot feature.
>> > Of course, fault-tolerance is very important, and file system should
>> > support
>> > it as you know as power-off-recovery.
>> > f2fs supports the recovery mechanism by adopting checkpoint similar to
>> > snapshot.
>> > But, f2fs does not support multiple snapshots for user convenience.
>> > I just focused on the performance, and absolutely, the multiple
>> > snapshot
>> > feature is also a good alternative approach.
>> > That may be a trade-off.
>> >
>> >> As I understand, it is not possible to have a perfect performance in
>> >> all
>> >> possible workloads. Could you
>> >> point out what workloads are the best way of F2FS using?
>> >
>> > Basically I think the following workloads will be good for F2FS.
>> > - Many random writes : it's LFS nature
>> > - Small writes with frequent fsync : f2fs is optimized to reduce the
>> > fsync
>> > overhead.
>> >
>> >>
>> >> > Maybe or obviously it is possible to optimize ext4 or btrfs to flash
>> >> > storages.
>> >> > IMHO, however, they are originally designed for HDDs, so that it may
>> >> > or
>> >> > may not suffer from
>> >> fundamental designs.
>> >> > I don't know, but why not designing a new file system for flash
>> >> > storages
>> >> > as a counterpart?
>> >> >
>> >>
>> >> Yes, it is possible. But F2FS is not flash oriented filesystem as
>> >> JFFS2,
>> >> YAFFS2, UBIFS but block-
>> >> oriented filesystem. So, F2FS design is restricted by block-layer's
>> >> opportunities in the using of
>> >> flash storages' peculiarities. Could you point out key points of F2FS
>> >> design that makes this design
>> >> fundamentally unique?
>> >
>> > As you can see the f2fs kernel document patch, I think one of the most
>> > important features is to align operating units between f2fs and ftl.
>> > Specifically, f2fs has section and zone, which are cleaning unit and
>> > basic
>> > allocation unit respectively.
>> > Through these configurable units in f2fs, I think f2fs is able to reduce
>> > the
>> > unnecessary operations done by FTL.
>> > And, in order to avoid changing IO patterns by the block-layer, f2fs
>> > merges
>> > itself some bios likewise ext4.
>> Hello.
>> The internal of eMMC and SSD is the blackbox from user side.
>> How does the normal user easily set operating units alignment(page
>> size and physical block size ?) between f2fs and ftl in storage device
>> ?
>
> I've known that some works have been tried to figure out the units by
> profiling the storage, AKA reverse engineering.
> In most cases, the simplest way is to measure the latencies of consecutive
> writes and analyze their patterns.
> As you mentioned, in practical, users will not want to do this, so maybe we
> need a tool to profile them to optimize f2fs.
> In the current state, I think profiling is an another issue, and mkfs.f2fs
> had better include this work in the future.
Well, Format tool evaluates optimal block size whenever formatting? As
you know, The size of Flash Based storage device is increasing every
year. It means format time can be too long on larger devices(e.g. one
device, one parition).
> But, IMO, from the viewpoint of performance, default configuration is quite
> enough now.
At default(after cleanly format), Would you share performance
difference between other log structured filesystems in comparison to
f2fs instead of ext4 ?

Thanks.
>
> ps) f2fs doesn't care about the flash page size, but considers garbage
> collection unit.
>
>>
>> Thanks.
>>
>> >
>> >>
>> >> With the best regards,
>> >> Vyacheslav Dubeyko.
>> >>
>> >>
>> >> >>
>> >> >> Marco
>> >> >
>> >> > ---
>> >> > Jaegeuk Kim
>> >> > Samsung
>> >> >
>> >> > --
>> >> > To unsubscribe from this list: send the line "unsubscribe
>> >> > linux-kernel"
>> >> > in
>> >> > the body of a message to [email protected]
>> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >> > Please read the FAQ at http://www.tux.org/lkml/
>> >
>> >
>> > ---
>> > Jaegeuk Kim
>> > Samsung
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"
>> > in
>> > the body of a message to [email protected]
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >
>
>
> ---
> Jaegeuk Kim
> Samsung
>
>
>

2012-10-08 12:11:51

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

> -----Original Message-----
> From: Namjae Jeon [mailto:[email protected]]
> Sent: Monday, October 08, 2012 8:22 PM
> To: Jaegeuk Kim
> Cc: Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>
> 2012/10/8, Jaegeuk Kim <[email protected]>:
> >> -----Original Message-----
> >> From: Namjae Jeon [mailto:[email protected]]
> >> Sent: Monday, October 08, 2012 7:00 PM
> >> To: Jaegeuk Kim
> >> Cc: Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro;
> >> [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected]
> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >>
> >> 2012/10/8, Jaegeuk Kim <[email protected]>:
> >> >> -----Original Message-----
> >> >> From: Vyacheslav Dubeyko [mailto:[email protected]]
> >> >> Sent: Sunday, October 07, 2012 9:09 PM
> >> >> To: Jaegeuk Kim
> >> >> Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected];
> >> >> [email protected]; linux-
> >> >> [email protected]; [email protected]; [email protected];
> >> >> [email protected];
> >> >> [email protected]
> >> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Oct 7, 2012, at 1:31 PM, Jaegeuk Kim wrote:
> >> >>
> >> >> >> -----Original Message-----
> >> >> >> From: Marco Stornelli [mailto:[email protected]]
> >> >> >> Sent: Sunday, October 07, 2012 4:10 PM
> >> >> >> To: Jaegeuk Kim
> >> >> >> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro;
> >> >> >> [email protected]; [email protected];
> >> >> >> [email protected]; [email protected];
> >> >> >> [email protected];
> >> >> [email protected];
> >> >> >> [email protected]
> >> >> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file
> >> >> >> system
> >> >> >>
> >> >> >> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
> >> >> >>> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
> >> >> >>>> Hi Jaegeuk,
> >> >> >>>
> >> >> >>> Hi.
> >> >> >>> We know each other, right? :)
> >> >> >>>
> >> >> >>>>
> >> >> >>>>> From: 김재극 <[email protected]>
> >> >> >>>>> To: [email protected], 'Theodore Ts'o' <[email protected]>,
> >> >> >> [email protected], [email protected],
> >> >> >> [email protected],
> >> >> [email protected],
> >> >> >> [email protected], [email protected]
> >> >> >>>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file
> >> >> >>>>> system
> >> >> >>>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
> >> >> >>>>>
> >> >> >>>>> This is a new patch set for the f2fs file system.
> >> >> >>>>>
> >> >> >>>>> What is F2FS?
> >> >> >>>>> =============
> >> >> >>>>>
> >> >> >>>>> NAND flash memory-based storage devices, such as SSD, eMMC, and
> >> >> >>>>> SD
> >> >> >>>>> cards, have
> >> >> >>>>> been widely being used for ranging from mobile to server
> >> >> >>>>> systems.
> >> >> >>>>> Since they are
> >> >> >>>>> known to have different characteristics from the conventional
> >> >> >>>>> rotational disks,
> >> >> >>>>> a file system, an upper layer to the storage device, should adapt
> >> >> >>>>> to
> >> >> >>>>> the changes
> >> >> >>>>> from the sketch.
> >> >> >>>>>
> >> >> >>>>> F2FS is a new file system carefully designed for the NAND flash
> >> >> >>>>> memory-based storage
> >> >> >>>>> devices. We chose a log structure file system approach, but we
> >> >> >>>>> tried
> >> >> >>>>> to adapt it
> >> >> >>>>> to the new form of storage. Also we remedy some known issues of
> >> >> >>>>> the
> >> >> >>>>> very old log
> >> >> >>>>> structured file system, such as snowball effect of wandering
> >> >> >>>>> tree
> >> >> >>>>> and high cleaning
> >> >> >>>>> overhead.
> >> >> >>>>>
> >> >> >>>>> Because a NAND-based storage device shows different
> >> >> >>>>> characteristics
> >> >> >>>>> according to
> >> >> >>>>> its internal geometry or flash memory management scheme aka FTL,
> >> >> >>>>> we
> >> >> >>>>> add various
> >> >> >>>>> parameters not only for configuring on-disk layout, but also for
> >> >> >>>>> selecting allocation
> >> >> >>>>> and cleaning algorithms.
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>> What about F2FS performance? Could you share benchmarking results
> >> >> >>>> of
> >> >> >>>> the new file system?
> >> >> >>>>
> >> >> >>>> It is very interesting the case of aged file system. How is GC's
> >> >> >>>> implementation efficient? Could
> >> >> >> you share benchmarking results for the very aged file system state?
> >> >> >>>>
> >> >> >>>
> >> >> >>> Although I have benchmark results, currently I'd like to see the
> >> >> >>> results
> >> >> >>> measured by community as a black-box. As you know, the results are
> >> >> >>> very
> >> >> >>> dependent on the workloads and parameters, so I think it would be
> >> >> >>> better
> >> >> >>> to see other results for a while.
> >> >> >>> Thanks,
> >> >> >>>
> >> >> >>
> >> >> >> 1) Actually it's a strange approach. If you have got any results
> >> >> >> you
> >> >> >> should share them with the community explaining how (the workload,
> >> >> >> hw
> >> >> >> and so on) your benchmark works and the specific condition. I
> >> >> >> really
> >> >> >> don't like the approach "I've got the results but I don't say
> >> >> >> anything,
> >> >> >> if you want a number, do it yourself".
> >> >> >
> >> >> > It's definitely right, and I meant *for a while*.
> >> >> > I just wanted to avoid arguing with how to age file system in this
> >> >> > time.
> >> >> > Before then, I share the primitive results as follows.
> >> >> >
> >> >> > 1. iozone in Panda board
> >> >> > - ARM A9
> >> >> > - DRAM : 1GB
> >> >> > - Kernel: Linux 3.3
> >> >> > - Partition: 12GB (64GB Samsung eMMC)
> >> >> > - Tested on 2GB file
> >> >> >
> >> >> > seq. read, seq. write, rand. read, rand. write
> >> >> > - ext4: 30.753 17.066 5.06 4.15
> >> >> > - f2fs: 30.71 16.906 5.073 15.204
> >> >> >
> >> >> > 2. iozone in Galaxy Nexus
> >> >> > - DRAM : 1GB
> >> >> > - Android 4.0.4_r1.2
> >> >> > - Kernel omap 3.0.8
> >> >> > - Partition: /data, 12GB
> >> >> > - Tested on 2GB file
> >> >> >
> >> >> > seq. read, seq. write, rand. read, rand. write
> >> >> > - ext4: 29.88 12.83 11.43 0.56
> >> >> > - f2fs: 29.70 13.34 10.79 12.82
> >> >> >
> >> >>
> >> >>
> >> >> This is results for non-aged filesystem state. Am I correct?
> >> >>
> >> >
> >> > Yes, right.
> >> >
> >> >>
> >> >> > Due to the company secret, I expect to show other results after
> >> >> > presenting f2fs at korea linux forum.
> >> >> >
> >> >> >> 2) For a new filesystem you should send the patches to
> >> >> >> linux-fsdevel.
> >> >> >
> >> >> > Yes, that was totally my mistake.
> >> >> >
> >> >> >> 3) It's not clear the pros/cons of your filesystem, can you share
> >> >> >> with
> >> >> >> us the main differences with the current fs already in mainline? Or
> >> >> >> is
> >> >> >> it a company secret?
> >> >> >
> >> >> > After forum, I can share the slides, and I hope they will be useful
> >> >> > to
> >> >> > you.
> >> >> >
> >> >> > Instead, let me summarize at a glance compared with other file
> >> >> > systems.
> >> >> > Here are several log-structured file systems.
> >> >> > Note that, F2FS operates on top of block device with consideration
> >> >> > on
> >> >> > the FTL behavior.
> >> >> > So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are
> >> >> > designed
> >> >> > for raw NAND flash.
> >> >> > LogFS is initially designed for raw NAND flash, but expanded to
> >> >> > block
> >> >> > device.
> >> >> > But, I don't know whether it is stable or not.
> >> >> > NILFS2 is one of major log-structured file systems, which supports
> >> >> > multiple snap-shots.
> >> >> > IMO, that feature is quite promising and important to users, but it
> >> >> > may
> >> >> > degrade the performance.
> >> >> > There is a trade-off between functionalities and performance.
> >> >> > F2FS chose high performance without any further fancy
> >> >> > functionalities.
> >> >> >
> >> >>
> >> >> Performance is a good goal. But fault-tolerance is also very important
> >> >> point. Filesystems are used by
> >> >> users, so, it is very important to guarantee reliability of data
> >> >> keeping.
> >> >> Degradation of performance
> >> >> by means of snapshots is arguable point. Snapshots can solve the
> >> >> problem
> >> >> not only some unpredictable
> >> >> environmental issues but also user's erroneous behavior.
> >> >>
> >> >
> >> > Yes, I agree. I concerned the multiple snapshot feature.
> >> > Of course, fault-tolerance is very important, and file system should
> >> > support
> >> > it as you know as power-off-recovery.
> >> > f2fs supports the recovery mechanism by adopting checkpoint similar to
> >> > snapshot.
> >> > But, f2fs does not support multiple snapshots for user convenience.
> >> > I just focused on the performance, and absolutely, the multiple
> >> > snapshot
> >> > feature is also a good alternative approach.
> >> > That may be a trade-off.
> >> >
> >> >> As I understand, it is not possible to have a perfect performance in
> >> >> all
> >> >> possible workloads. Could you
> >> >> point out what workloads are the best way of F2FS using?
> >> >
> >> > Basically I think the following workloads will be good for F2FS.
> >> > - Many random writes : it's LFS nature
> >> > - Small writes with frequent fsync : f2fs is optimized to reduce the
> >> > fsync
> >> > overhead.
> >> >
> >> >>
> >> >> > Maybe or obviously it is possible to optimize ext4 or btrfs to flash
> >> >> > storages.
> >> >> > IMHO, however, they are originally designed for HDDs, so that it may
> >> >> > or
> >> >> > may not suffer from
> >> >> fundamental designs.
> >> >> > I don't know, but why not designing a new file system for flash
> >> >> > storages
> >> >> > as a counterpart?
> >> >> >
> >> >>
> >> >> Yes, it is possible. But F2FS is not flash oriented filesystem as
> >> >> JFFS2,
> >> >> YAFFS2, UBIFS but block-
> >> >> oriented filesystem. So, F2FS design is restricted by block-layer's
> >> >> opportunities in the using of
> >> >> flash storages' peculiarities. Could you point out key points of F2FS
> >> >> design that makes this design
> >> >> fundamentally unique?
> >> >
> >> > As you can see the f2fs kernel document patch, I think one of the most
> >> > important features is to align operating units between f2fs and ftl.
> >> > Specifically, f2fs has section and zone, which are cleaning unit and
> >> > basic
> >> > allocation unit respectively.
> >> > Through these configurable units in f2fs, I think f2fs is able to reduce
> >> > the
> >> > unnecessary operations done by FTL.
> >> > And, in order to avoid changing IO patterns by the block-layer, f2fs
> >> > merges
> >> > itself some bios likewise ext4.
> >> Hello.
> >> The internal of eMMC and SSD is the blackbox from user side.
> >> How does the normal user easily set operating units alignment(page
> >> size and physical block size ?) between f2fs and ftl in storage device
> >> ?
> >
> > I've known that some works have been tried to figure out the units by
> > profiling the storage, AKA reverse engineering.
> > In most cases, the simplest way is to measure the latencies of consecutive
> > writes and analyze their patterns.
> > As you mentioned, in practical, users will not want to do this, so maybe we
> > need a tool to profile them to optimize f2fs.
> > In the current state, I think profiling is an another issue, and mkfs.f2fs
> > had better include this work in the future.
> Well, Format tool evaluates optimal block size whenever formatting? As
> you know, The size of Flash Based storage device is increasing every
> year. It means format time can be too long on larger devices(e.g. one
> device, one parition).

Every file systems will suffer from the long format time in such a huge device.
And, I don't think the profiling time would not be scaled up, since it's unnecessary to scan whole device.
After getting the size, we just can stop it.

> > But, IMO, from the viewpoint of performance, default configuration is quite
> > enough now.
> At default(after cleanly format), Would you share performance
> difference between other log structured filesystems in comparison to
> f2fs instead of ext4 ?
>

Actually, we've focused on ext4, so I have no results of other file systems measured on embedded systems.
I'll test sooner or later, and report them.
Thank you for valuable comments.

> Thanks.
> >
> > ps) f2fs doesn't care about the flash page size, but considers garbage
> > collection unit.
> >
> >>
> >> Thanks.
> >>
> >> >
> >> >>
> >> >> With the best regards,
> >> >> Vyacheslav Dubeyko.
> >> >>
> >> >>
> >> >> >>
> >> >> >> Marco
> >> >> >
> >> >> > ---
> >> >> > Jaegeuk Kim
> >> >> > Samsung
> >> >> >
> >> >> > --
> >> >> > To unsubscribe from this list: send the line "unsubscribe
> >> >> > linux-kernel"
> >> >> > in
> >> >> > the body of a message to [email protected]
> >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> >> > Please read the FAQ at http://www.tux.org/lkml/
> >> >
> >> >
> >> > ---
> >> > Jaegeuk Kim
> >> > Samsung
> >> >
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"
> >> > in
> >> > the body of a message to [email protected]
> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> >
> >
> >
> > ---
> > Jaegeuk Kim
> > Samsung
> >
> >
> >


---
Jaegeuk Kim
Samsung

2012-10-08 19:23:01

by Viacheslav Dubeyko

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

Hi,

On Oct 8, 2012, at 12:25 PM, Jaegeuk Kim wrote:

>> -----Original Message-----
>> From: Vyacheslav Dubeyko [mailto:[email protected]]
>> Sent: Sunday, October 07, 2012 9:09 PM
>> To: Jaegeuk Kim
>> Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected]; [email protected]; linux-
>> [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]
>> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>>
>> Hi,
>>
>> On Oct 7, 2012, at 1:31 PM, Jaegeuk Kim wrote:
>>
>>>> -----Original Message-----
>>>> From: Marco Stornelli [mailto:[email protected]]
>>>> Sent: Sunday, October 07, 2012 4:10 PM
>>>> To: Jaegeuk Kim
>>>> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro; [email protected]; [email protected];
>>>> [email protected]; [email protected]; [email protected];
>> [email protected];
>>>> [email protected]
>>>> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>>>>
>>>> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
>>>>> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
>>>>>> Hi Jaegeuk,
>>>>>
>>>>> Hi.
>>>>> We know each other, right? :)
>>>>>
>>>>>>
>>>>>>> From: 김재극 <[email protected]>
>>>>>>> To: [email protected], 'Theodore Ts'o' <[email protected]>,
>>>> [email protected], [email protected], [email protected],
>> [email protected],
>>>> [email protected], [email protected]
>>>>>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
>>>>>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
>>>>>>>
>>>>>>> This is a new patch set for the f2fs file system.
>>>>>>>
>>>>>>> What is F2FS?
>>>>>>> =============
>>>>>>>
>>>>>>> NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
>>>>>>> been widely being used for ranging from mobile to server systems. Since they are
>>>>>>> known to have different characteristics from the conventional rotational disks,
>>>>>>> a file system, an upper layer to the storage device, should adapt to the changes
>>>>>>> from the sketch.
>>>>>>>
>>>>>>> F2FS is a new file system carefully designed for the NAND flash memory-based storage
>>>>>>> devices. We chose a log structure file system approach, but we tried to adapt it
>>>>>>> to the new form of storage. Also we remedy some known issues of the very old log
>>>>>>> structured file system, such as snowball effect of wandering tree and high cleaning
>>>>>>> overhead.
>>>>>>>
>>>>>>> Because a NAND-based storage device shows different characteristics according to
>>>>>>> its internal geometry or flash memory management scheme aka FTL, we add various
>>>>>>> parameters not only for configuring on-disk layout, but also for selecting allocation
>>>>>>> and cleaning algorithms.
>>>>>>>
>>>>>>
>>>>>> What about F2FS performance? Could you share benchmarking results of the new file system?
>>>>>>
>>>>>> It is very interesting the case of aged file system. How is GC's implementation efficient? Could
>>>> you share benchmarking results for the very aged file system state?
>>>>>>
>>>>>
>>>>> Although I have benchmark results, currently I'd like to see the results
>>>>> measured by community as a black-box. As you know, the results are very
>>>>> dependent on the workloads and parameters, so I think it would be better
>>>>> to see other results for a while.
>>>>> Thanks,
>>>>>
>>>>
>>>> 1) Actually it's a strange approach. If you have got any results you
>>>> should share them with the community explaining how (the workload, hw
>>>> and so on) your benchmark works and the specific condition. I really
>>>> don't like the approach "I've got the results but I don't say anything,
>>>> if you want a number, do it yourself".
>>>
>>> It's definitely right, and I meant *for a while*.
>>> I just wanted to avoid arguing with how to age file system in this time.
>>> Before then, I share the primitive results as follows.
>>>
>>> 1. iozone in Panda board
>>> - ARM A9
>>> - DRAM : 1GB
>>> - Kernel: Linux 3.3
>>> - Partition: 12GB (64GB Samsung eMMC)
>>> - Tested on 2GB file
>>>
>>> seq. read, seq. write, rand. read, rand. write
>>> - ext4: 30.753 17.066 5.06 4.15
>>> - f2fs: 30.71 16.906 5.073 15.204
>>>
>>> 2. iozone in Galaxy Nexus
>>> - DRAM : 1GB
>>> - Android 4.0.4_r1.2
>>> - Kernel omap 3.0.8
>>> - Partition: /data, 12GB
>>> - Tested on 2GB file
>>>
>>> seq. read, seq. write, rand. read, rand. write
>>> - ext4: 29.88 12.83 11.43 0.56
>>> - f2fs: 29.70 13.34 10.79 12.82
>>>
>>
>>
>> This is results for non-aged filesystem state. Am I correct?
>>
>
> Yes, right.
>
>>
>>> Due to the company secret, I expect to show other results after presenting f2fs at korea linux forum.
>>>
>>>> 2) For a new filesystem you should send the patches to linux-fsdevel.
>>>
>>> Yes, that was totally my mistake.
>>>
>>>> 3) It's not clear the pros/cons of your filesystem, can you share with
>>>> us the main differences with the current fs already in mainline? Or is
>>>> it a company secret?
>>>
>>> After forum, I can share the slides, and I hope they will be useful to you.
>>>
>>> Instead, let me summarize at a glance compared with other file systems.
>>> Here are several log-structured file systems.
>>> Note that, F2FS operates on top of block device with consideration on the FTL behavior.
>>> So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are designed for raw NAND flash.
>>> LogFS is initially designed for raw NAND flash, but expanded to block device.
>>> But, I don't know whether it is stable or not.
>>> NILFS2 is one of major log-structured file systems, which supports multiple snap-shots.
>>> IMO, that feature is quite promising and important to users, but it may degrade the performance.
>>> There is a trade-off between functionalities and performance.
>>> F2FS chose high performance without any further fancy functionalities.
>>>
>>
>> Performance is a good goal. But fault-tolerance is also very important point. Filesystems are used by
>> users, so, it is very important to guarantee reliability of data keeping. Degradation of performance
>> by means of snapshots is arguable point. Snapshots can solve the problem not only some unpredictable
>> environmental issues but also user's erroneous behavior.
>>
>
> Yes, I agree. I concerned the multiple snapshot feature.
> Of course, fault-tolerance is very important, and file system should support it as you know as power-off-recovery.
> f2fs supports the recovery mechanism by adopting checkpoint similar to snapshot.
> But, f2fs does not support multiple snapshots for user convenience.
> I just focused on the performance, and absolutely, the multiple snapshot feature is also a good alternative approach.
> That may be a trade-off.

So, maybe I misunderstand something, but I can't understand the difference. As I know, snapshot in NILFS2 is a checkpoint converted by user in snapshot. So, NILFS2's checkpoint is a log that adds new file system's state changing (user data + metadata). In other words, checkpoint is mechanism of writing on volume. Moreover, NILFS2 gives flexible way of checkpoint/snapshot management.

As you are saying, f2fs supports checkpoints also. It means for me that checkpoints are the basic mechanism of writing operations on f2fs. But, about what performance gain and difference do you talk?

Moreover, user can't manage by f2fs checkpoints completely, as I can understand. It is not so clear what critical points can be a starting points of recovery actions. How is it possible to define how many checkpoints f2fs volume will have?

How many user data (metadata) can be lost in the case of sudden power off? Is it possible to estimate this?

>
>> As I understand, it is not possible to have a perfect performance in all possible workloads. Could you
>> point out what workloads are the best way of F2FS using?
>
> Basically I think the following workloads will be good for F2FS.
> - Many random writes : it's LFS nature
> - Small writes with frequent fsync : f2fs is optimized to reduce the fsync overhead.
>

Yes, it can be so for the case of non-aged f2fs volume. But I am afraid that for the case of aged f2fs volume the situation can be opposite. I think that in the case of aged state of f2fs volume the GC will be under hard work in above-mentioned workloads.

But, as I can understand, smartphones and tablets are the most promising way of f2fs using. Because f2fs designs for NAND flash memory based-storage devices. So, I think that such workloads as "many random writes" or "small writes with frequent fsync" are not so frequent use-cases. Use-case of creation and deletion many small files can be more frequent use-case under smartphones and tablets. But, as I can understand, f2fs has slightly expensive metadata payload in the case of small files creation. Moreover, frequent and random deletion of small files ends in the very sophisticated and unpredictable GC behavior, as I can understand.

>>
>>> Maybe or obviously it is possible to optimize ext4 or btrfs to flash storages.
>>> IMHO, however, they are originally designed for HDDs, so that it may or may not suffer from
>> fundamental designs.
>>> I don't know, but why not designing a new file system for flash storages as a counterpart?
>>>
>>
>> Yes, it is possible. But F2FS is not flash oriented filesystem as JFFS2, YAFFS2, UBIFS but block-
>> oriented filesystem. So, F2FS design is restricted by block-layer's opportunities in the using of
>> flash storages' peculiarities. Could you point out key points of F2FS design that makes this design
>> fundamentally unique?
>
> As you can see the f2fs kernel document patch, I think one of the most important features is to align operating units between f2fs and ftl.
> Specifically, f2fs has section and zone, which are cleaning unit and basic allocation unit respectively.
> Through these configurable units in f2fs, I think f2fs is able to reduce the unnecessary operations done by FTL.
> And, in order to avoid changing IO patterns by the block-layer, f2fs merges itself some bios likewise ext4.
>

As I can understand, it is not so easy to create partition with f2fs volume which is aligned on operating units (especially in the case of eMMC or SSD). Performance of unaligned volume can degrade significantly because of FTL activity. What mechanisms has f2fs for excluding such situation and achieving of the goal to reduce unnecessary FTL operations?

With the best regards,
Vyacheslav Dubeyko.

>>
>> With the best regards,
>> Vyacheslav Dubeyko.
>>
>>
>>>>
>>>> Marco
>>>
>>> ---
>>> Jaegeuk Kim
>>> Samsung
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at http://www.tux.org/lkml/
>
>
> ---
> Jaegeuk Kim
> Samsung
>

2012-10-09 03:52:15

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

2012/10/8, Jaegeuk Kim <[email protected]>:
>> -----Original Message-----
>> From: Namjae Jeon [mailto:[email protected]]
>> Sent: Monday, October 08, 2012 8:22 PM
>> To: Jaegeuk Kim
>> Cc: Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro;
>> [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]
>> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>>
>> 2012/10/8, Jaegeuk Kim <[email protected]>:
>> >> -----Original Message-----
>> >> From: Namjae Jeon [mailto:[email protected]]
>> >> Sent: Monday, October 08, 2012 7:00 PM
>> >> To: Jaegeuk Kim
>> >> Cc: Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro;
>> >> [email protected];
>> >> [email protected]; [email protected];
>> >> [email protected]; [email protected];
>> >> [email protected]; [email protected]
>> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>> >>
>> >> 2012/10/8, Jaegeuk Kim <[email protected]>:
>> >> >> -----Original Message-----
>> >> >> From: Vyacheslav Dubeyko [mailto:[email protected]]
>> >> >> Sent: Sunday, October 07, 2012 9:09 PM
>> >> >> To: Jaegeuk Kim
>> >> >> Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected];
>> >> >> [email protected]; linux-
>> >> >> [email protected]; [email protected];
>> >> >> [email protected];
>> >> >> [email protected];
>> >> >> [email protected]
>> >> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file
>> >> >> system
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> On Oct 7, 2012, at 1:31 PM, Jaegeuk Kim wrote:
>> >> >>
>> >> >> >> -----Original Message-----
>> >> >> >> From: Marco Stornelli [mailto:[email protected]]
>> >> >> >> Sent: Sunday, October 07, 2012 4:10 PM
>> >> >> >> To: Jaegeuk Kim
>> >> >> >> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro;
>> >> >> >> [email protected]; [email protected];
>> >> >> >> [email protected]; [email protected];
>> >> >> >> [email protected];
>> >> >> [email protected];
>> >> >> >> [email protected]
>> >> >> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file
>> >> >> >> system
>> >> >> >>
>> >> >> >> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
>> >> >> >>> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
>> >> >> >>>> Hi Jaegeuk,
>> >> >> >>>
>> >> >> >>> Hi.
>> >> >> >>> We know each other, right? :)
>> >> >> >>>
>> >> >> >>>>
>> >> >> >>>>> From: 김재극 <[email protected]>
>> >> >> >>>>> To: [email protected], 'Theodore Ts'o'
>> >> >> >>>>> <[email protected]>,
>> >> >> >> [email protected], [email protected],
>> >> >> >> [email protected],
>> >> >> [email protected],
>> >> >> >> [email protected], [email protected]
>> >> >> >>>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file
>> >> >> >>>>> system
>> >> >> >>>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
>> >> >> >>>>>
>> >> >> >>>>> This is a new patch set for the f2fs file system.
>> >> >> >>>>>
>> >> >> >>>>> What is F2FS?
>> >> >> >>>>> =============
>> >> >> >>>>>
>> >> >> >>>>> NAND flash memory-based storage devices, such as SSD, eMMC,
>> >> >> >>>>> and
>> >> >> >>>>> SD
>> >> >> >>>>> cards, have
>> >> >> >>>>> been widely being used for ranging from mobile to server
>> >> >> >>>>> systems.
>> >> >> >>>>> Since they are
>> >> >> >>>>> known to have different characteristics from the conventional
>> >> >> >>>>> rotational disks,
>> >> >> >>>>> a file system, an upper layer to the storage device, should
>> >> >> >>>>> adapt
>> >> >> >>>>> to
>> >> >> >>>>> the changes
>> >> >> >>>>> from the sketch.
>> >> >> >>>>>
>> >> >> >>>>> F2FS is a new file system carefully designed for the NAND
>> >> >> >>>>> flash
>> >> >> >>>>> memory-based storage
>> >> >> >>>>> devices. We chose a log structure file system approach, but
>> >> >> >>>>> we
>> >> >> >>>>> tried
>> >> >> >>>>> to adapt it
>> >> >> >>>>> to the new form of storage. Also we remedy some known issues
>> >> >> >>>>> of
>> >> >> >>>>> the
>> >> >> >>>>> very old log
>> >> >> >>>>> structured file system, such as snowball effect of wandering
>> >> >> >>>>> tree
>> >> >> >>>>> and high cleaning
>> >> >> >>>>> overhead.
>> >> >> >>>>>
>> >> >> >>>>> Because a NAND-based storage device shows different
>> >> >> >>>>> characteristics
>> >> >> >>>>> according to
>> >> >> >>>>> its internal geometry or flash memory management scheme aka
>> >> >> >>>>> FTL,
>> >> >> >>>>> we
>> >> >> >>>>> add various
>> >> >> >>>>> parameters not only for configuring on-disk layout, but also
>> >> >> >>>>> for
>> >> >> >>>>> selecting allocation
>> >> >> >>>>> and cleaning algorithms.
>> >> >> >>>>>
>> >> >> >>>>
>> >> >> >>>> What about F2FS performance? Could you share benchmarking
>> >> >> >>>> results
>> >> >> >>>> of
>> >> >> >>>> the new file system?
>> >> >> >>>>
>> >> >> >>>> It is very interesting the case of aged file system. How is
>> >> >> >>>> GC's
>> >> >> >>>> implementation efficient? Could
>> >> >> >> you share benchmarking results for the very aged file system
>> >> >> >> state?
>> >> >> >>>>
>> >> >> >>>
>> >> >> >>> Although I have benchmark results, currently I'd like to see
>> >> >> >>> the
>> >> >> >>> results
>> >> >> >>> measured by community as a black-box. As you know, the results
>> >> >> >>> are
>> >> >> >>> very
>> >> >> >>> dependent on the workloads and parameters, so I think it would
>> >> >> >>> be
>> >> >> >>> better
>> >> >> >>> to see other results for a while.
>> >> >> >>> Thanks,
>> >> >> >>>
>> >> >> >>
>> >> >> >> 1) Actually it's a strange approach. If you have got any results
>> >> >> >> you
>> >> >> >> should share them with the community explaining how (the
>> >> >> >> workload,
>> >> >> >> hw
>> >> >> >> and so on) your benchmark works and the specific condition. I
>> >> >> >> really
>> >> >> >> don't like the approach "I've got the results but I don't say
>> >> >> >> anything,
>> >> >> >> if you want a number, do it yourself".
>> >> >> >
>> >> >> > It's definitely right, and I meant *for a while*.
>> >> >> > I just wanted to avoid arguing with how to age file system in
>> >> >> > this
>> >> >> > time.
>> >> >> > Before then, I share the primitive results as follows.
>> >> >> >
>> >> >> > 1. iozone in Panda board
>> >> >> > - ARM A9
>> >> >> > - DRAM : 1GB
>> >> >> > - Kernel: Linux 3.3
>> >> >> > - Partition: 12GB (64GB Samsung eMMC)
>> >> >> > - Tested on 2GB file
>> >> >> >
>> >> >> > seq. read, seq. write, rand. read, rand. write
>> >> >> > - ext4: 30.753 17.066 5.06 4.15
>> >> >> > - f2fs: 30.71 16.906 5.073 15.204
>> >> >> >
>> >> >> > 2. iozone in Galaxy Nexus
>> >> >> > - DRAM : 1GB
>> >> >> > - Android 4.0.4_r1.2
>> >> >> > - Kernel omap 3.0.8
>> >> >> > - Partition: /data, 12GB
>> >> >> > - Tested on 2GB file
>> >> >> >
>> >> >> > seq. read, seq. write, rand. read, rand. write
>> >> >> > - ext4: 29.88 12.83 11.43 0.56
>> >> >> > - f2fs: 29.70 13.34 10.79 12.82
>> >> >> >
>> >> >>
>> >> >>
>> >> >> This is results for non-aged filesystem state. Am I correct?
>> >> >>
>> >> >
>> >> > Yes, right.
>> >> >
>> >> >>
>> >> >> > Due to the company secret, I expect to show other results after
>> >> >> > presenting f2fs at korea linux forum.
>> >> >> >
>> >> >> >> 2) For a new filesystem you should send the patches to
>> >> >> >> linux-fsdevel.
>> >> >> >
>> >> >> > Yes, that was totally my mistake.
>> >> >> >
>> >> >> >> 3) It's not clear the pros/cons of your filesystem, can you
>> >> >> >> share
>> >> >> >> with
>> >> >> >> us the main differences with the current fs already in mainline?
>> >> >> >> Or
>> >> >> >> is
>> >> >> >> it a company secret?
>> >> >> >
>> >> >> > After forum, I can share the slides, and I hope they will be
>> >> >> > useful
>> >> >> > to
>> >> >> > you.
>> >> >> >
>> >> >> > Instead, let me summarize at a glance compared with other file
>> >> >> > systems.
>> >> >> > Here are several log-structured file systems.
>> >> >> > Note that, F2FS operates on top of block device with
>> >> >> > consideration
>> >> >> > on
>> >> >> > the FTL behavior.
>> >> >> > So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are
>> >> >> > designed
>> >> >> > for raw NAND flash.
>> >> >> > LogFS is initially designed for raw NAND flash, but expanded to
>> >> >> > block
>> >> >> > device.
>> >> >> > But, I don't know whether it is stable or not.
>> >> >> > NILFS2 is one of major log-structured file systems, which
>> >> >> > supports
>> >> >> > multiple snap-shots.
>> >> >> > IMO, that feature is quite promising and important to users, but
>> >> >> > it
>> >> >> > may
>> >> >> > degrade the performance.
>> >> >> > There is a trade-off between functionalities and performance.
>> >> >> > F2FS chose high performance without any further fancy
>> >> >> > functionalities.
>> >> >> >
>> >> >>
>> >> >> Performance is a good goal. But fault-tolerance is also very
>> >> >> important
>> >> >> point. Filesystems are used by
>> >> >> users, so, it is very important to guarantee reliability of data
>> >> >> keeping.
>> >> >> Degradation of performance
>> >> >> by means of snapshots is arguable point. Snapshots can solve the
>> >> >> problem
>> >> >> not only some unpredictable
>> >> >> environmental issues but also user's erroneous behavior.
>> >> >>
>> >> >
>> >> > Yes, I agree. I concerned the multiple snapshot feature.
>> >> > Of course, fault-tolerance is very important, and file system should
>> >> > support
>> >> > it as you know as power-off-recovery.
>> >> > f2fs supports the recovery mechanism by adopting checkpoint similar
>> >> > to
>> >> > snapshot.
>> >> > But, f2fs does not support multiple snapshots for user convenience.
>> >> > I just focused on the performance, and absolutely, the multiple
>> >> > snapshot
>> >> > feature is also a good alternative approach.
>> >> > That may be a trade-off.
>> >> >
>> >> >> As I understand, it is not possible to have a perfect performance
>> >> >> in
>> >> >> all
>> >> >> possible workloads. Could you
>> >> >> point out what workloads are the best way of F2FS using?
>> >> >
>> >> > Basically I think the following workloads will be good for F2FS.
>> >> > - Many random writes : it's LFS nature
>> >> > - Small writes with frequent fsync : f2fs is optimized to reduce the
>> >> > fsync
>> >> > overhead.
>> >> >
>> >> >>
>> >> >> > Maybe or obviously it is possible to optimize ext4 or btrfs to
>> >> >> > flash
>> >> >> > storages.
>> >> >> > IMHO, however, they are originally designed for HDDs, so that it
>> >> >> > may
>> >> >> > or
>> >> >> > may not suffer from
>> >> >> fundamental designs.
>> >> >> > I don't know, but why not designing a new file system for flash
>> >> >> > storages
>> >> >> > as a counterpart?
>> >> >> >
>> >> >>
>> >> >> Yes, it is possible. But F2FS is not flash oriented filesystem as
>> >> >> JFFS2,
>> >> >> YAFFS2, UBIFS but block-
>> >> >> oriented filesystem. So, F2FS design is restricted by block-layer's
>> >> >> opportunities in the using of
>> >> >> flash storages' peculiarities. Could you point out key points of
>> >> >> F2FS
>> >> >> design that makes this design
>> >> >> fundamentally unique?
>> >> >
>> >> > As you can see the f2fs kernel document patch, I think one of the
>> >> > most
>> >> > important features is to align operating units between f2fs and ftl.
>> >> > Specifically, f2fs has section and zone, which are cleaning unit and
>> >> > basic
>> >> > allocation unit respectively.
>> >> > Through these configurable units in f2fs, I think f2fs is able to
>> >> > reduce
>> >> > the
>> >> > unnecessary operations done by FTL.
>> >> > And, in order to avoid changing IO patterns by the block-layer, f2fs
>> >> > merges
>> >> > itself some bios likewise ext4.
>> >> Hello.
>> >> The internal of eMMC and SSD is the blackbox from user side.
>> >> How does the normal user easily set operating units alignment(page
>> >> size and physical block size ?) between f2fs and ftl in storage device
>> >> ?
>> >
>> > I've known that some works have been tried to figure out the units by
>> > profiling the storage, AKA reverse engineering.
>> > In most cases, the simplest way is to measure the latencies of
>> > consecutive
>> > writes and analyze their patterns.
>> > As you mentioned, in practical, users will not want to do this, so maybe
>> > we
>> > need a tool to profile them to optimize f2fs.
>> > In the current state, I think profiling is an another issue, and
>> > mkfs.f2fs
>> > had better include this work in the future.
>> Well, Format tool evaluates optimal block size whenever formatting? As
>> you know, The size of Flash Based storage device is increasing every
>> year. It means format time can be too long on larger devices(e.g. one
>> device, one parition).
>
> Every file systems will suffer from the long format time in such a huge
> device.
> And, I don't think the profiling time would not be scaled up, since it's
> unnecessary to scan whole device.
> After getting the size, we just can stop it.
The key point is that you should estimate correct optimal block size
of ftl with much less I/O at format time.
I am not sure it is possible.
And you should prove optimal block size is really correct on several
device per vendor device.

>
>> > But, IMO, from the viewpoint of performance, default configuration is
>> > quite
>> > enough now.
>> At default(after cleanly format), Would you share performance
>> difference between other log structured filesystems in comparison to
>> f2fs instead of ext4 ?
>>
>
> Actually, we've focused on ext4, so I have no results of other file systems
> measured on embedded systems.
> I'll test sooner or later, and report them.
Okay, Thanks Jaegeuk.

> Thank you for valuable comments.
>
>> Thanks.
>> >
>> > ps) f2fs doesn't care about the flash page size, but considers garbage
>> > collection unit.
>> >
>> >>
>> >> Thanks.
>> >>
>> >> >
>> >> >>
>> >> >> With the best regards,
>> >> >> Vyacheslav Dubeyko.
>> >> >>
>> >> >>
>> >> >> >>
>> >> >> >> Marco
>> >> >> >
>> >> >> > ---
>> >> >> > Jaegeuk Kim
>> >> >> > Samsung
>> >> >> >
>> >> >> > --
>> >> >> > To unsubscribe from this list: send the line "unsubscribe
>> >> >> > linux-kernel"
>> >> >> > in
>> >> >> > the body of a message to [email protected]
>> >> >> > More majordomo info at
>> >> >> > http://vger.kernel.org/majordomo-info.html
>> >> >> > Please read the FAQ at http://www.tux.org/lkml/
>> >> >
>> >> >
>> >> > ---
>> >> > Jaegeuk Kim
>> >> > Samsung
>> >> >
>> >> > --
>> >> > To unsubscribe from this list: send the line "unsubscribe
>> >> > linux-fsdevel"
>> >> > in
>> >> > the body of a message to [email protected]
>> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >> >
>> >
>> >
>> > ---
>> > Jaegeuk Kim
>> > Samsung
>> >
>> >
>> >
>
>
> ---
> Jaegeuk Kim
> Samsung
>
>

2012-10-09 07:08:39

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

> -----Original Message-----
> From: Vyacheslav Dubeyko [mailto:[email protected]]
> Sent: Tuesday, October 09, 2012 4:23 AM
> To: Jaegeuk Kim
> Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>
> Hi,
>
> On Oct 8, 2012, at 12:25 PM, Jaegeuk Kim wrote:
>
> >> -----Original Message-----
> >> From: Vyacheslav Dubeyko [mailto:[email protected]]
> >> Sent: Sunday, October 07, 2012 9:09 PM
> >> To: Jaegeuk Kim
> >> Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected]; [email protected]; linux-
> >> [email protected]; [email protected]; [email protected]; [email protected];
> >> [email protected]
> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >>
> >> Hi,
> >>
> >> On Oct 7, 2012, at 1:31 PM, Jaegeuk Kim wrote:
> >>
> >>>> -----Original Message-----
> >>>> From: Marco Stornelli [mailto:[email protected]]
> >>>> Sent: Sunday, October 07, 2012 4:10 PM
> >>>> To: Jaegeuk Kim
> >>>> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro; [email protected];
> [email protected];
> >>>> [email protected]; [email protected]; [email protected];
> >> [email protected];
> >>>> [email protected]
> >>>> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >>>>
> >>>> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
> >>>>> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
> >>>>>> Hi Jaegeuk,
> >>>>>
> >>>>> Hi.
> >>>>> We know each other, right? :)
> >>>>>
> >>>>>>
> >>>>>>> From: 김재극 <[email protected]>
> >>>>>>> To: [email protected], 'Theodore Ts'o' <[email protected]>,
> >>>> [email protected], [email protected], [email protected],
> >> [email protected],
> >>>> [email protected], [email protected]
> >>>>>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >>>>>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
> >>>>>>>
> >>>>>>> This is a new patch set for the f2fs file system.
> >>>>>>>
> >>>>>>> What is F2FS?
> >>>>>>> =============
> >>>>>>>
> >>>>>>> NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
> >>>>>>> been widely being used for ranging from mobile to server systems. Since they are
> >>>>>>> known to have different characteristics from the conventional rotational disks,
> >>>>>>> a file system, an upper layer to the storage device, should adapt to the changes
> >>>>>>> from the sketch.
> >>>>>>>
> >>>>>>> F2FS is a new file system carefully designed for the NAND flash memory-based storage
> >>>>>>> devices. We chose a log structure file system approach, but we tried to adapt it
> >>>>>>> to the new form of storage. Also we remedy some known issues of the very old log
> >>>>>>> structured file system, such as snowball effect of wandering tree and high cleaning
> >>>>>>> overhead.
> >>>>>>>
> >>>>>>> Because a NAND-based storage device shows different characteristics according to
> >>>>>>> its internal geometry or flash memory management scheme aka FTL, we add various
> >>>>>>> parameters not only for configuring on-disk layout, but also for selecting allocation
> >>>>>>> and cleaning algorithms.
> >>>>>>>
> >>>>>>
> >>>>>> What about F2FS performance? Could you share benchmarking results of the new file system?
> >>>>>>
> >>>>>> It is very interesting the case of aged file system. How is GC's implementation efficient?
> Could
> >>>> you share benchmarking results for the very aged file system state?
> >>>>>>
> >>>>>
> >>>>> Although I have benchmark results, currently I'd like to see the results
> >>>>> measured by community as a black-box. As you know, the results are very
> >>>>> dependent on the workloads and parameters, so I think it would be better
> >>>>> to see other results for a while.
> >>>>> Thanks,
> >>>>>
> >>>>
> >>>> 1) Actually it's a strange approach. If you have got any results you
> >>>> should share them with the community explaining how (the workload, hw
> >>>> and so on) your benchmark works and the specific condition. I really
> >>>> don't like the approach "I've got the results but I don't say anything,
> >>>> if you want a number, do it yourself".
> >>>
> >>> It's definitely right, and I meant *for a while*.
> >>> I just wanted to avoid arguing with how to age file system in this time.
> >>> Before then, I share the primitive results as follows.
> >>>
> >>> 1. iozone in Panda board
> >>> - ARM A9
> >>> - DRAM : 1GB
> >>> - Kernel: Linux 3.3
> >>> - Partition: 12GB (64GB Samsung eMMC)
> >>> - Tested on 2GB file
> >>>
> >>> seq. read, seq. write, rand. read, rand. write
> >>> - ext4: 30.753 17.066 5.06 4.15
> >>> - f2fs: 30.71 16.906 5.073 15.204
> >>>
> >>> 2. iozone in Galaxy Nexus
> >>> - DRAM : 1GB
> >>> - Android 4.0.4_r1.2
> >>> - Kernel omap 3.0.8
> >>> - Partition: /data, 12GB
> >>> - Tested on 2GB file
> >>>
> >>> seq. read, seq. write, rand. read, rand. write
> >>> - ext4: 29.88 12.83 11.43 0.56
> >>> - f2fs: 29.70 13.34 10.79 12.82
> >>>
> >>
> >>
> >> This is results for non-aged filesystem state. Am I correct?
> >>
> >
> > Yes, right.
> >
> >>
> >>> Due to the company secret, I expect to show other results after presenting f2fs at korea linux
> forum.
> >>>
> >>>> 2) For a new filesystem you should send the patches to linux-fsdevel.
> >>>
> >>> Yes, that was totally my mistake.
> >>>
> >>>> 3) It's not clear the pros/cons of your filesystem, can you share with
> >>>> us the main differences with the current fs already in mainline? Or is
> >>>> it a company secret?
> >>>
> >>> After forum, I can share the slides, and I hope they will be useful to you.
> >>>
> >>> Instead, let me summarize at a glance compared with other file systems.
> >>> Here are several log-structured file systems.
> >>> Note that, F2FS operates on top of block device with consideration on the FTL behavior.
> >>> So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are designed for raw NAND flash.
> >>> LogFS is initially designed for raw NAND flash, but expanded to block device.
> >>> But, I don't know whether it is stable or not.
> >>> NILFS2 is one of major log-structured file systems, which supports multiple snap-shots.
> >>> IMO, that feature is quite promising and important to users, but it may degrade the performance.
> >>> There is a trade-off between functionalities and performance.
> >>> F2FS chose high performance without any further fancy functionalities.
> >>>
> >>
> >> Performance is a good goal. But fault-tolerance is also very important point. Filesystems are used
> by
> >> users, so, it is very important to guarantee reliability of data keeping. Degradation of
> performance
> >> by means of snapshots is arguable point. Snapshots can solve the problem not only some
> unpredictable
> >> environmental issues but also user's erroneous behavior.
> >>
> >
> > Yes, I agree. I concerned the multiple snapshot feature.
> > Of course, fault-tolerance is very important, and file system should support it as you know as
> power-off-recovery.
> > f2fs supports the recovery mechanism by adopting checkpoint similar to snapshot.
> > But, f2fs does not support multiple snapshots for user convenience.
> > I just focused on the performance, and absolutely, the multiple snapshot feature is also a good
> alternative approach.
> > That may be a trade-off.
>
> So, maybe I misunderstand something, but I can't understand the difference. As I know, snapshot in
> NILFS2 is a checkpoint converted by user in snapshot. So, NILFS2's checkpoint is a log that adds new
> file system's state changing (user data + metadata). In other words, checkpoint is mechanism of
> writing on volume. Moreover, NILFS2 gives flexible way of checkpoint/snapshot management.
>
> As you are saying, f2fs supports checkpoints also. It means for me that checkpoints are the basic
> mechanism of writing operations on f2fs. But, about what performance gain and difference do you talk?

How about the following scenario?
1. data "a" is newly written.
2. checkpoint "A" is done.
3. data "a" is truncated.
4. checkpoint "B" is done.

If fs supports multiple snapshots like "A" and "B" to users, it cannot reuse the space allocated by
data "a" after checkpoint "B" even though data "a" is safely truncated by checkpoint "B".
This is because fs should keep data "a" to prepare a roll-back to "A".
So, even though user sees some free space, LFS may suffer from cleaning due to the exhausted free space.
If users want to avoid this, they have to remove snapshots by themselves. Or, maybe automatically?

>
> Moreover, user can't manage by f2fs checkpoints completely, as I can understand. It is not so clear
> what critical points can be a starting points of recovery actions. How is it possible to define how
> many checkpoints f2fs volume will have?

IMHO, user does not need to know how many snapshots there exist and track the fs utilization all the time.
(off list: I don't know why cleaning process should be tuned by users.)

f2fs writes two checkpoints alternatively. One is for the last stable checkpoint and another is for next checkpoint.
So, during the recovery, f2fs starts to find one of the latest stable checkpoint.
The stable checkpoint must have whole index structures and data consistently.
As you knew, many things can be found in the following LFS paper.
http://www.cs.berkeley.edu/~brewer/cs262/LFS.pdf


>
> How many user data (metadata) can be lost in the case of sudden power off? Is it possible to estimate
> this?
>

If user calls sync, f2fs via vfs writes all the data, and it writes a checkpoint.
In that case, all the data are safe.
After sync, several fsync can be triggered, and it occurs sudden power off.
In that case, f2fs first performs roll-back to the last stable checkpoint among two, and then roll-forward to recover fsync'ed data only.
So, f2fs recovers data triggered by sync or fsync only.

> >
> >> As I understand, it is not possible to have a perfect performance in all possible workloads. Could
> you
> >> point out what workloads are the best way of F2FS using?
> >
> > Basically I think the following workloads will be good for F2FS.
> > - Many random writes : it's LFS nature
> > - Small writes with frequent fsync : f2fs is optimized to reduce the fsync overhead.
> >
>
> Yes, it can be so for the case of non-aged f2fs volume. But I am afraid that for the case of aged f2fs
> volume the situation can be opposite. I think that in the case of aged state of f2fs volume the GC
> will be under hard work in above-mentioned workloads.

Yes, you're right.
In the LFS paper above, there are two logging schemes: threaded logging and copy-and-compaction.
In order to avoid high cleaning overhead, f2fs adopts a hybrid one which changes the allocation policy dynamically
between two schemes.
Threaded logging is similar to the traditional approach, resulting in random writes without cleaning operations.
Copy-and-compaction is another name of cleaning, resulting in sequential writes with cleaning operations.
So, f2fs adopts one of them in runtime according to the file system status.
Through this, we could see the random write performance comparable to ext4 even in the worst case.

>
> But, as I can understand, smartphones and tablets are the most promising way of f2fs using. Because
> f2fs designs for NAND flash memory based-storage devices. So, I think that such workloads as "many
> random writes" or "small writes with frequent fsync" are not so frequent use-cases. Use-case of
> creation and deletion many small files can be more frequent use-case under smartphones and tablets.
> But, as I can understand, f2fs has slightly expensive metadata payload in the case of small files
> creation. Moreover, frequent and random deletion of small files ends in the very sophisticated and
> unpredictable GC behavior, as I can understand.
>

I'd like to share the following paper.
http://research.cs.wisc.edu/adsl/Publications/ibench-tocs12.pdf

In our experiments *also* on android phones, we've seen many random patterns with frequent fsync calls.
We found that the main problem is database, and I think f2fs is beneficial to this.
As you mentioned, I agree that it is important to handle many small files too.
It is right that this may cause additional cleaning overhead, and f2fs has some metadata payload overhead.
In order to reduce the cleaning overhead, f2fs adopts static and dynamic hot and cold data separation.
The main goal is to split the data according to their type (e.g., dir inode, file inode, dentry data, etc) as much as possible.
Please see the document in detail.
I think this approach is quite effective to achieve the goal.
BTW, the payload overhead can be resolved by adopting embedding data in the inode likewise ext4.
I think it is also good idea, and I hope to adopt it in future.

> >>
> >>> Maybe or obviously it is possible to optimize ext4 or btrfs to flash storages.
> >>> IMHO, however, they are originally designed for HDDs, so that it may or may not suffer from
> >> fundamental designs.
> >>> I don't know, but why not designing a new file system for flash storages as a counterpart?
> >>>
> >>
> >> Yes, it is possible. But F2FS is not flash oriented filesystem as JFFS2, YAFFS2, UBIFS but block-
> >> oriented filesystem. So, F2FS design is restricted by block-layer's opportunities in the using of
> >> flash storages' peculiarities. Could you point out key points of F2FS design that makes this design
> >> fundamentally unique?
> >
> > As you can see the f2fs kernel document patch, I think one of the most important features is to
> align operating units between f2fs and ftl.
> > Specifically, f2fs has section and zone, which are cleaning unit and basic allocation unit
> respectively.
> > Through these configurable units in f2fs, I think f2fs is able to reduce the unnecessary operations
> done by FTL.
> > And, in order to avoid changing IO patterns by the block-layer, f2fs merges itself some bios
> likewise ext4.
> >
>
> As I can understand, it is not so easy to create partition with f2fs volume which is aligned on
> operating units (especially in the case of eMMC or SSD).

Could you explain why it is not so easy?

> Performance of unaligned volume can degrade
> significantly because of FTL activity. What mechanisms has f2fs for excluding such situation and
> achieving of the goal to reduce unnecessary FTL operations?

Could you please explain your concern more exactly?
In the kernel doc, the start address of f2fs data structure is aligned to the segment size (i.e., 2MB).
Do you mean that or another operating units (e.g., section and zone)?

Thanks,

>
> With the best regards,
> Vyacheslav Dubeyko.
>
> >>
> >> With the best regards,
> >> Vyacheslav Dubeyko.
> >>
> >>
> >>>>
> >>>> Marco
> >>>
> >>> ---
> >>> Jaegeuk Kim
> >>> Samsung
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >>> the body of a message to [email protected]
> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>> Please read the FAQ at http://www.tux.org/lkml/
> >
> >
> > ---
> > Jaegeuk Kim
> > Samsung
> >


---
Jaegeuk Kim
Samsung

2012-10-09 08:01:00

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system



---
Jaegeuk Kim
Samsung


> -----Original Message-----
> From: Namjae Jeon [mailto:[email protected]]
> Sent: Tuesday, October 09, 2012 12:52 PM
> To: Jaegeuk Kim
> Cc: Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>
> 2012/10/8, Jaegeuk Kim <[email protected]>:
> >> -----Original Message-----
> >> From: Namjae Jeon [mailto:[email protected]]
> >> Sent: Monday, October 08, 2012 8:22 PM
> >> To: Jaegeuk Kim
> >> Cc: Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro;
> >> [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected]
> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >>
> >> 2012/10/8, Jaegeuk Kim <[email protected]>:
> >> >> -----Original Message-----
> >> >> From: Namjae Jeon [mailto:[email protected]]
> >> >> Sent: Monday, October 08, 2012 7:00 PM
> >> >> To: Jaegeuk Kim
> >> >> Cc: Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro;
> >> >> [email protected];
> >> >> [email protected]; [email protected];
> >> >> [email protected]; [email protected];
> >> >> [email protected]; [email protected]
> >> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >> >>
> >> >> 2012/10/8, Jaegeuk Kim <[email protected]>:
> >> >> >> -----Original Message-----
> >> >> >> From: Vyacheslav Dubeyko [mailto:[email protected]]
> >> >> >> Sent: Sunday, October 07, 2012 9:09 PM
> >> >> >> To: Jaegeuk Kim
> >> >> >> Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected];
> >> >> >> [email protected]; linux-
> >> >> >> [email protected]; [email protected];
> >> >> >> [email protected];
> >> >> >> [email protected];
> >> >> >> [email protected]
> >> >> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file
> >> >> >> system
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> On Oct 7, 2012, at 1:31 PM, Jaegeuk Kim wrote:
> >> >> >>
> >> >> >> >> -----Original Message-----
> >> >> >> >> From: Marco Stornelli [mailto:[email protected]]
> >> >> >> >> Sent: Sunday, October 07, 2012 4:10 PM
> >> >> >> >> To: Jaegeuk Kim
> >> >> >> >> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro;
> >> >> >> >> [email protected]; [email protected];
> >> >> >> >> [email protected]; [email protected];
> >> >> >> >> [email protected];
> >> >> >> [email protected];
> >> >> >> >> [email protected]
> >> >> >> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file
> >> >> >> >> system
> >> >> >> >>
> >> >> >> >> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
> >> >> >> >>> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
> >> >> >> >>>> Hi Jaegeuk,
> >> >> >> >>>
> >> >> >> >>> Hi.
> >> >> >> >>> We know each other, right? :)
> >> >> >> >>>
> >> >> >> >>>>
> >> >> >> >>>>> From: 김재극 <[email protected]>
> >> >> >> >>>>> To: [email protected], 'Theodore Ts'o'
> >> >> >> >>>>> <[email protected]>,
> >> >> >> >> [email protected], [email protected],
> >> >> >> >> [email protected],
> >> >> >> [email protected],
> >> >> >> >> [email protected], [email protected]
> >> >> >> >>>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file
> >> >> >> >>>>> system
> >> >> >> >>>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
> >> >> >> >>>>>
> >> >> >> >>>>> This is a new patch set for the f2fs file system.
> >> >> >> >>>>>
> >> >> >> >>>>> What is F2FS?
> >> >> >> >>>>> =============
> >> >> >> >>>>>
> >> >> >> >>>>> NAND flash memory-based storage devices, such as SSD, eMMC,
> >> >> >> >>>>> and
> >> >> >> >>>>> SD
> >> >> >> >>>>> cards, have
> >> >> >> >>>>> been widely being used for ranging from mobile to server
> >> >> >> >>>>> systems.
> >> >> >> >>>>> Since they are
> >> >> >> >>>>> known to have different characteristics from the conventional
> >> >> >> >>>>> rotational disks,
> >> >> >> >>>>> a file system, an upper layer to the storage device, should
> >> >> >> >>>>> adapt
> >> >> >> >>>>> to
> >> >> >> >>>>> the changes
> >> >> >> >>>>> from the sketch.
> >> >> >> >>>>>
> >> >> >> >>>>> F2FS is a new file system carefully designed for the NAND
> >> >> >> >>>>> flash
> >> >> >> >>>>> memory-based storage
> >> >> >> >>>>> devices. We chose a log structure file system approach, but
> >> >> >> >>>>> we
> >> >> >> >>>>> tried
> >> >> >> >>>>> to adapt it
> >> >> >> >>>>> to the new form of storage. Also we remedy some known issues
> >> >> >> >>>>> of
> >> >> >> >>>>> the
> >> >> >> >>>>> very old log
> >> >> >> >>>>> structured file system, such as snowball effect of wandering
> >> >> >> >>>>> tree
> >> >> >> >>>>> and high cleaning
> >> >> >> >>>>> overhead.
> >> >> >> >>>>>
> >> >> >> >>>>> Because a NAND-based storage device shows different
> >> >> >> >>>>> characteristics
> >> >> >> >>>>> according to
> >> >> >> >>>>> its internal geometry or flash memory management scheme aka
> >> >> >> >>>>> FTL,
> >> >> >> >>>>> we
> >> >> >> >>>>> add various
> >> >> >> >>>>> parameters not only for configuring on-disk layout, but also
> >> >> >> >>>>> for
> >> >> >> >>>>> selecting allocation
> >> >> >> >>>>> and cleaning algorithms.
> >> >> >> >>>>>
> >> >> >> >>>>
> >> >> >> >>>> What about F2FS performance? Could you share benchmarking
> >> >> >> >>>> results
> >> >> >> >>>> of
> >> >> >> >>>> the new file system?
> >> >> >> >>>>
> >> >> >> >>>> It is very interesting the case of aged file system. How is
> >> >> >> >>>> GC's
> >> >> >> >>>> implementation efficient? Could
> >> >> >> >> you share benchmarking results for the very aged file system
> >> >> >> >> state?
> >> >> >> >>>>
> >> >> >> >>>
> >> >> >> >>> Although I have benchmark results, currently I'd like to see
> >> >> >> >>> the
> >> >> >> >>> results
> >> >> >> >>> measured by community as a black-box. As you know, the results
> >> >> >> >>> are
> >> >> >> >>> very
> >> >> >> >>> dependent on the workloads and parameters, so I think it would
> >> >> >> >>> be
> >> >> >> >>> better
> >> >> >> >>> to see other results for a while.
> >> >> >> >>> Thanks,
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >> 1) Actually it's a strange approach. If you have got any results
> >> >> >> >> you
> >> >> >> >> should share them with the community explaining how (the
> >> >> >> >> workload,
> >> >> >> >> hw
> >> >> >> >> and so on) your benchmark works and the specific condition. I
> >> >> >> >> really
> >> >> >> >> don't like the approach "I've got the results but I don't say
> >> >> >> >> anything,
> >> >> >> >> if you want a number, do it yourself".
> >> >> >> >
> >> >> >> > It's definitely right, and I meant *for a while*.
> >> >> >> > I just wanted to avoid arguing with how to age file system in
> >> >> >> > this
> >> >> >> > time.
> >> >> >> > Before then, I share the primitive results as follows.
> >> >> >> >
> >> >> >> > 1. iozone in Panda board
> >> >> >> > - ARM A9
> >> >> >> > - DRAM : 1GB
> >> >> >> > - Kernel: Linux 3.3
> >> >> >> > - Partition: 12GB (64GB Samsung eMMC)
> >> >> >> > - Tested on 2GB file
> >> >> >> >
> >> >> >> > seq. read, seq. write, rand. read, rand. write
> >> >> >> > - ext4: 30.753 17.066 5.06 4.15
> >> >> >> > - f2fs: 30.71 16.906 5.073 15.204
> >> >> >> >
> >> >> >> > 2. iozone in Galaxy Nexus
> >> >> >> > - DRAM : 1GB
> >> >> >> > - Android 4.0.4_r1.2
> >> >> >> > - Kernel omap 3.0.8
> >> >> >> > - Partition: /data, 12GB
> >> >> >> > - Tested on 2GB file
> >> >> >> >
> >> >> >> > seq. read, seq. write, rand. read, rand. write
> >> >> >> > - ext4: 29.88 12.83 11.43 0.56
> >> >> >> > - f2fs: 29.70 13.34 10.79 12.82
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> This is results for non-aged filesystem state. Am I correct?
> >> >> >>
> >> >> >
> >> >> > Yes, right.
> >> >> >
> >> >> >>
> >> >> >> > Due to the company secret, I expect to show other results after
> >> >> >> > presenting f2fs at korea linux forum.
> >> >> >> >
> >> >> >> >> 2) For a new filesystem you should send the patches to
> >> >> >> >> linux-fsdevel.
> >> >> >> >
> >> >> >> > Yes, that was totally my mistake.
> >> >> >> >
> >> >> >> >> 3) It's not clear the pros/cons of your filesystem, can you
> >> >> >> >> share
> >> >> >> >> with
> >> >> >> >> us the main differences with the current fs already in mainline?
> >> >> >> >> Or
> >> >> >> >> is
> >> >> >> >> it a company secret?
> >> >> >> >
> >> >> >> > After forum, I can share the slides, and I hope they will be
> >> >> >> > useful
> >> >> >> > to
> >> >> >> > you.
> >> >> >> >
> >> >> >> > Instead, let me summarize at a glance compared with other file
> >> >> >> > systems.
> >> >> >> > Here are several log-structured file systems.
> >> >> >> > Note that, F2FS operates on top of block device with
> >> >> >> > consideration
> >> >> >> > on
> >> >> >> > the FTL behavior.
> >> >> >> > So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are
> >> >> >> > designed
> >> >> >> > for raw NAND flash.
> >> >> >> > LogFS is initially designed for raw NAND flash, but expanded to
> >> >> >> > block
> >> >> >> > device.
> >> >> >> > But, I don't know whether it is stable or not.
> >> >> >> > NILFS2 is one of major log-structured file systems, which
> >> >> >> > supports
> >> >> >> > multiple snap-shots.
> >> >> >> > IMO, that feature is quite promising and important to users, but
> >> >> >> > it
> >> >> >> > may
> >> >> >> > degrade the performance.
> >> >> >> > There is a trade-off between functionalities and performance.
> >> >> >> > F2FS chose high performance without any further fancy
> >> >> >> > functionalities.
> >> >> >> >
> >> >> >>
> >> >> >> Performance is a good goal. But fault-tolerance is also very
> >> >> >> important
> >> >> >> point. Filesystems are used by
> >> >> >> users, so, it is very important to guarantee reliability of data
> >> >> >> keeping.
> >> >> >> Degradation of performance
> >> >> >> by means of snapshots is arguable point. Snapshots can solve the
> >> >> >> problem
> >> >> >> not only some unpredictable
> >> >> >> environmental issues but also user's erroneous behavior.
> >> >> >>
> >> >> >
> >> >> > Yes, I agree. I concerned the multiple snapshot feature.
> >> >> > Of course, fault-tolerance is very important, and file system should
> >> >> > support
> >> >> > it as you know as power-off-recovery.
> >> >> > f2fs supports the recovery mechanism by adopting checkpoint similar
> >> >> > to
> >> >> > snapshot.
> >> >> > But, f2fs does not support multiple snapshots for user convenience.
> >> >> > I just focused on the performance, and absolutely, the multiple
> >> >> > snapshot
> >> >> > feature is also a good alternative approach.
> >> >> > That may be a trade-off.
> >> >> >
> >> >> >> As I understand, it is not possible to have a perfect performance
> >> >> >> in
> >> >> >> all
> >> >> >> possible workloads. Could you
> >> >> >> point out what workloads are the best way of F2FS using?
> >> >> >
> >> >> > Basically I think the following workloads will be good for F2FS.
> >> >> > - Many random writes : it's LFS nature
> >> >> > - Small writes with frequent fsync : f2fs is optimized to reduce the
> >> >> > fsync
> >> >> > overhead.
> >> >> >
> >> >> >>
> >> >> >> > Maybe or obviously it is possible to optimize ext4 or btrfs to
> >> >> >> > flash
> >> >> >> > storages.
> >> >> >> > IMHO, however, they are originally designed for HDDs, so that it
> >> >> >> > may
> >> >> >> > or
> >> >> >> > may not suffer from
> >> >> >> fundamental designs.
> >> >> >> > I don't know, but why not designing a new file system for flash
> >> >> >> > storages
> >> >> >> > as a counterpart?
> >> >> >> >
> >> >> >>
> >> >> >> Yes, it is possible. But F2FS is not flash oriented filesystem as
> >> >> >> JFFS2,
> >> >> >> YAFFS2, UBIFS but block-
> >> >> >> oriented filesystem. So, F2FS design is restricted by block-layer's
> >> >> >> opportunities in the using of
> >> >> >> flash storages' peculiarities. Could you point out key points of
> >> >> >> F2FS
> >> >> >> design that makes this design
> >> >> >> fundamentally unique?
> >> >> >
> >> >> > As you can see the f2fs kernel document patch, I think one of the
> >> >> > most
> >> >> > important features is to align operating units between f2fs and ftl.
> >> >> > Specifically, f2fs has section and zone, which are cleaning unit and
> >> >> > basic
> >> >> > allocation unit respectively.
> >> >> > Through these configurable units in f2fs, I think f2fs is able to
> >> >> > reduce
> >> >> > the
> >> >> > unnecessary operations done by FTL.
> >> >> > And, in order to avoid changing IO patterns by the block-layer, f2fs
> >> >> > merges
> >> >> > itself some bios likewise ext4.
> >> >> Hello.
> >> >> The internal of eMMC and SSD is the blackbox from user side.
> >> >> How does the normal user easily set operating units alignment(page
> >> >> size and physical block size ?) between f2fs and ftl in storage device
> >> >> ?
> >> >
> >> > I've known that some works have been tried to figure out the units by
> >> > profiling the storage, AKA reverse engineering.
> >> > In most cases, the simplest way is to measure the latencies of
> >> > consecutive
> >> > writes and analyze their patterns.
> >> > As you mentioned, in practical, users will not want to do this, so maybe
> >> > we
> >> > need a tool to profile them to optimize f2fs.
> >> > In the current state, I think profiling is an another issue, and
> >> > mkfs.f2fs
> >> > had better include this work in the future.
> >> Well, Format tool evaluates optimal block size whenever formatting? As
> >> you know, The size of Flash Based storage device is increasing every
> >> year. It means format time can be too long on larger devices(e.g. one
> >> device, one parition).
> >
> > Every file systems will suffer from the long format time in such a huge
> > device.
> > And, I don't think the profiling time would not be scaled up, since it's
> > unnecessary to scan whole device.
> > After getting the size, we just can stop it.
> The key point is that you should estimate correct optimal block size
> of ftl with much less I/O at format time.

Yes, exactly.

> I am not sure it is possible.

Why do you think like that?
As I tested before, I could see a kind of patterns when writing just several tens of MB on eMMC.

> And you should prove optimal block size is really correct on several
> device per vendor device.

Yes, it is correct, but unfortunately, I cannot prove for all the devices.
You're arguing about heuristic vs. optimal approaches.
IMHO, most file systems are based on a heuristic approach.
And f2fs also adopts a heuristic approach, which means it tries to help FTL as much as possible,
not cooperates with FTL directly.
Furthermore, even though the default unit size is not optimal, I believe that it can be well operated in most cases.
(Since most SSDs has 512KB of erase block size, so 2MB can cover 4-way SSDs.)

Thanks,

>
> >
> >> > But, IMO, from the viewpoint of performance, default configuration is
> >> > quite
> >> > enough now.
> >> At default(after cleanly format), Would you share performance
> >> difference between other log structured filesystems in comparison to
> >> f2fs instead of ext4 ?
> >>
> >
> > Actually, we've focused on ext4, so I have no results of other file systems
> > measured on embedded systems.
> > I'll test sooner or later, and report them.
> Okay, Thanks Jaegeuk.
>
> > Thank you for valuable comments.
> >
> >> Thanks.
> >> >
> >> > ps) f2fs doesn't care about the flash page size, but considers garbage
> >> > collection unit.
> >> >
> >> >>
> >> >> Thanks.
> >> >>
> >> >> >
> >> >> >>
> >> >> >> With the best regards,
> >> >> >> Vyacheslav Dubeyko.
> >> >> >>
> >> >> >>
> >> >> >> >>
> >> >> >> >> Marco
> >> >> >> >
> >> >> >> > ---
> >> >> >> > Jaegeuk Kim
> >> >> >> > Samsung
> >> >> >> >
> >> >> >> > --
> >> >> >> > To unsubscribe from this list: send the line "unsubscribe
> >> >> >> > linux-kernel"
> >> >> >> > in
> >> >> >> > the body of a message to [email protected]
> >> >> >> > More majordomo info at
> >> >> >> > http://vger.kernel.org/majordomo-info.html
> >> >> >> > Please read the FAQ at http://www.tux.org/lkml/
> >> >> >
> >> >> >
> >> >> > ---
> >> >> > Jaegeuk Kim
> >> >> > Samsung
> >> >> >
> >> >> > --
> >> >> > To unsubscribe from this list: send the line "unsubscribe
> >> >> > linux-fsdevel"
> >> >> > in
> >> >> > the body of a message to [email protected]
> >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> >> >
> >> >
> >> >
> >> > ---
> >> > Jaegeuk Kim
> >> > Samsung
> >> >
> >> >
> >> >
> >
> >
> > ---
> > Jaegeuk Kim
> > Samsung
> >
> >

2012-10-09 08:31:56

by Lukas Czerner

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

On Mon, 8 Oct 2012, Jaegeuk Kim wrote:

> Date: Mon, 08 Oct 2012 19:52:03 +0900
> From: Jaegeuk Kim <[email protected]>
> To: 'Namjae Jeon' <[email protected]>
> Cc: 'Vyacheslav Dubeyko' <[email protected]>,
> 'Marco Stornelli' <[email protected]>,
> 'Jaegeuk Kim' <[email protected]>,
> 'Al Viro' <[email protected]>, [email protected],
> [email protected], [email protected],
> [email protected], [email protected], [email protected],
> [email protected]
> Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system
>
> > -----Original Message-----
> > From: Namjae Jeon [mailto:[email protected]]
> > Sent: Monday, October 08, 2012 7:00 PM
> > To: Jaegeuk Kim
> > Cc: Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro; [email protected];
> > [email protected]; [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]
> > Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >
> > 2012/10/8, Jaegeuk Kim <[email protected]>:
> > >> -----Original Message-----
> > >> From: Vyacheslav Dubeyko [mailto:[email protected]]
> > >> Sent: Sunday, October 07, 2012 9:09 PM
> > >> To: Jaegeuk Kim
> > >> Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected];
> > >> [email protected]; linux-
> > >> [email protected]; [email protected]; [email protected];
> > >> [email protected];
> > >> [email protected]
> > >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > >>
> > >> Hi,
> > >>
> > >> On Oct 7, 2012, at 1:31 PM, Jaegeuk Kim wrote:
> > >>
> > >> >> -----Original Message-----
> > >> >> From: Marco Stornelli [mailto:[email protected]]
> > >> >> Sent: Sunday, October 07, 2012 4:10 PM
> > >> >> To: Jaegeuk Kim
> > >> >> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro;
> > >> >> [email protected]; [email protected];
> > >> >> [email protected]; [email protected];
> > >> >> [email protected];
> > >> [email protected];
> > >> >> [email protected]
> > >> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > >> >>
> > >> >> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
> > >> >>> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
> > >> >>>> Hi Jaegeuk,
> > >> >>>
> > >> >>> Hi.
> > >> >>> We know each other, right? :)
> > >> >>>
> > >> >>>>
> > >> >>>>> From: 김재극 <[email protected]>
> > >> >>>>> To: [email protected], 'Theodore Ts'o' <[email protected]>,
> > >> >> [email protected], [email protected],
> > >> >> [email protected],
> > >> [email protected],
> > >> >> [email protected], [email protected]
> > >> >>>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > >> >>>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
> > >> >>>>>
> > >> >>>>> This is a new patch set for the f2fs file system.
> > >> >>>>>
> > >> >>>>> What is F2FS?
> > >> >>>>> =============
> > >> >>>>>
> > >> >>>>> NAND flash memory-based storage devices, such as SSD, eMMC, and SD
> > >> >>>>> cards, have
> > >> >>>>> been widely being used for ranging from mobile to server systems.
> > >> >>>>> Since they are
> > >> >>>>> known to have different characteristics from the conventional
> > >> >>>>> rotational disks,
> > >> >>>>> a file system, an upper layer to the storage device, should adapt to
> > >> >>>>> the changes
> > >> >>>>> from the sketch.
> > >> >>>>>
> > >> >>>>> F2FS is a new file system carefully designed for the NAND flash
> > >> >>>>> memory-based storage
> > >> >>>>> devices. We chose a log structure file system approach, but we tried
> > >> >>>>> to adapt it
> > >> >>>>> to the new form of storage. Also we remedy some known issues of the
> > >> >>>>> very old log
> > >> >>>>> structured file system, such as snowball effect of wandering tree
> > >> >>>>> and high cleaning
> > >> >>>>> overhead.
> > >> >>>>>
> > >> >>>>> Because a NAND-based storage device shows different characteristics
> > >> >>>>> according to
> > >> >>>>> its internal geometry or flash memory management scheme aka FTL, we
> > >> >>>>> add various
> > >> >>>>> parameters not only for configuring on-disk layout, but also for
> > >> >>>>> selecting allocation
> > >> >>>>> and cleaning algorithms.
> > >> >>>>>
> > >> >>>>
> > >> >>>> What about F2FS performance? Could you share benchmarking results of
> > >> >>>> the new file system?
> > >> >>>>
> > >> >>>> It is very interesting the case of aged file system. How is GC's
> > >> >>>> implementation efficient? Could
> > >> >> you share benchmarking results for the very aged file system state?
> > >> >>>>
> > >> >>>
> > >> >>> Although I have benchmark results, currently I'd like to see the
> > >> >>> results
> > >> >>> measured by community as a black-box. As you know, the results are
> > >> >>> very
> > >> >>> dependent on the workloads and parameters, so I think it would be
> > >> >>> better
> > >> >>> to see other results for a while.
> > >> >>> Thanks,
> > >> >>>
> > >> >>
> > >> >> 1) Actually it's a strange approach. If you have got any results you
> > >> >> should share them with the community explaining how (the workload, hw
> > >> >> and so on) your benchmark works and the specific condition. I really
> > >> >> don't like the approach "I've got the results but I don't say
> > >> >> anything,
> > >> >> if you want a number, do it yourself".
> > >> >
> > >> > It's definitely right, and I meant *for a while*.
> > >> > I just wanted to avoid arguing with how to age file system in this
> > >> > time.
> > >> > Before then, I share the primitive results as follows.
> > >> >
> > >> > 1. iozone in Panda board
> > >> > - ARM A9
> > >> > - DRAM : 1GB
> > >> > - Kernel: Linux 3.3
> > >> > - Partition: 12GB (64GB Samsung eMMC)
> > >> > - Tested on 2GB file
> > >> >
> > >> > seq. read, seq. write, rand. read, rand. write
> > >> > - ext4: 30.753 17.066 5.06 4.15
> > >> > - f2fs: 30.71 16.906 5.073 15.204
> > >> >
> > >> > 2. iozone in Galaxy Nexus
> > >> > - DRAM : 1GB
> > >> > - Android 4.0.4_r1.2
> > >> > - Kernel omap 3.0.8
> > >> > - Partition: /data, 12GB
> > >> > - Tested on 2GB file
> > >> >
> > >> > seq. read, seq. write, rand. read, rand. write
> > >> > - ext4: 29.88 12.83 11.43 0.56
> > >> > - f2fs: 29.70 13.34 10.79 12.82
> > >> >
> > >>
> > >>
> > >> This is results for non-aged filesystem state. Am I correct?
> > >>
> > >
> > > Yes, right.
> > >
> > >>
> > >> > Due to the company secret, I expect to show other results after
> > >> > presenting f2fs at korea linux forum.
> > >> >
> > >> >> 2) For a new filesystem you should send the patches to linux-fsdevel.
> > >> >
> > >> > Yes, that was totally my mistake.
> > >> >
> > >> >> 3) It's not clear the pros/cons of your filesystem, can you share with
> > >> >> us the main differences with the current fs already in mainline? Or is
> > >> >> it a company secret?
> > >> >
> > >> > After forum, I can share the slides, and I hope they will be useful to
> > >> > you.
> > >> >
> > >> > Instead, let me summarize at a glance compared with other file systems.
> > >> > Here are several log-structured file systems.
> > >> > Note that, F2FS operates on top of block device with consideration on
> > >> > the FTL behavior.
> > >> > So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are designed
> > >> > for raw NAND flash.
> > >> > LogFS is initially designed for raw NAND flash, but expanded to block
> > >> > device.
> > >> > But, I don't know whether it is stable or not.
> > >> > NILFS2 is one of major log-structured file systems, which supports
> > >> > multiple snap-shots.
> > >> > IMO, that feature is quite promising and important to users, but it may
> > >> > degrade the performance.
> > >> > There is a trade-off between functionalities and performance.
> > >> > F2FS chose high performance without any further fancy functionalities.
> > >> >
> > >>
> > >> Performance is a good goal. But fault-tolerance is also very important
> > >> point. Filesystems are used by
> > >> users, so, it is very important to guarantee reliability of data keeping.
> > >> Degradation of performance
> > >> by means of snapshots is arguable point. Snapshots can solve the problem
> > >> not only some unpredictable
> > >> environmental issues but also user's erroneous behavior.
> > >>
> > >
> > > Yes, I agree. I concerned the multiple snapshot feature.
> > > Of course, fault-tolerance is very important, and file system should support
> > > it as you know as power-off-recovery.
> > > f2fs supports the recovery mechanism by adopting checkpoint similar to
> > > snapshot.
> > > But, f2fs does not support multiple snapshots for user convenience.
> > > I just focused on the performance, and absolutely, the multiple snapshot
> > > feature is also a good alternative approach.
> > > That may be a trade-off.
> > >
> > >> As I understand, it is not possible to have a perfect performance in all
> > >> possible workloads. Could you
> > >> point out what workloads are the best way of F2FS using?
> > >
> > > Basically I think the following workloads will be good for F2FS.
> > > - Many random writes : it's LFS nature
> > > - Small writes with frequent fsync : f2fs is optimized to reduce the fsync
> > > overhead.
> > >
> > >>
> > >> > Maybe or obviously it is possible to optimize ext4 or btrfs to flash
> > >> > storages.
> > >> > IMHO, however, they are originally designed for HDDs, so that it may or
> > >> > may not suffer from
> > >> fundamental designs.
> > >> > I don't know, but why not designing a new file system for flash storages
> > >> > as a counterpart?
> > >> >
> > >>
> > >> Yes, it is possible. But F2FS is not flash oriented filesystem as JFFS2,
> > >> YAFFS2, UBIFS but block-
> > >> oriented filesystem. So, F2FS design is restricted by block-layer's
> > >> opportunities in the using of
> > >> flash storages' peculiarities. Could you point out key points of F2FS
> > >> design that makes this design
> > >> fundamentally unique?
> > >
> > > As you can see the f2fs kernel document patch, I think one of the most
> > > important features is to align operating units between f2fs and ftl.
> > > Specifically, f2fs has section and zone, which are cleaning unit and basic
> > > allocation unit respectively.
> > > Through these configurable units in f2fs, I think f2fs is able to reduce the
> > > unnecessary operations done by FTL.
> > > And, in order to avoid changing IO patterns by the block-layer, f2fs merges
> > > itself some bios likewise ext4.
> > Hello.
> > The internal of eMMC and SSD is the blackbox from user side.
> > How does the normal user easily set operating units alignment(page
> > size and physical block size ?) between f2fs and ftl in storage device
> > ?
>
> I've known that some works have been tried to figure out the units by profiling the storage, AKA reverse engineering.
> In most cases, the simplest way is to measure the latencies of consecutive writes and analyze their patterns.
> As you mentioned, in practical, users will not want to do this, so maybe we need a tool to profile them to optimize f2fs.
> In the current state, I think profiling is an another issue, and mkfs.f2fs had better include this work in the future.
> But, IMO, from the viewpoint of performance, default configuration is quite enough now.
>
> ps) f2fs doesn't care about the flash page size, but considers garbage collection unit.

I am sorry but this reply makes me smile. How can you design a fs
relying on time attack heuristics to figure out what the proper
layout should be ? Or even endorse such heuristics to be used in
mkfs ? What we should be focusing on is to push vendors to actually
give us such information so we can properly propagate that
throughout the kernel - that's something everyone will benefit from.
After that the optimization can be done in every file system.

Promoting time attack heuristics instead of pushing vendors to tell
us how their hardware should be used is a journey to hell and we've
been talking about this for a looong time now. And I imagine that
you especially have quite some persuasion power.

Thanks!
-Lukas

>
> >
> > Thanks.
> >
> > >
> > >>
> > >> With the best regards,
> > >> Vyacheslav Dubeyko.
> > >>
> > >>
> > >> >>
> > >> >> Marco
> > >> >
> > >> > ---
> > >> > Jaegeuk Kim
> > >> > Samsung
> > >> >
> > >> > --
> > >> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> > >> > in
> > >> > the body of a message to [email protected]
> > >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >> > Please read the FAQ at http://www.tux.org/lkml/
> > >
> > >
> > > ---
> > > Jaegeuk Kim
> > > Samsung
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
>
>
> ---
> Jaegeuk Kim
> Samsung
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2012-10-09 10:46:20

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Luka? Czerner
> Sent: Tuesday, October 09, 2012 5:32 PM
> To: Jaegeuk Kim
> Cc: 'Namjae Jeon'; 'Vyacheslav Dubeyko'; 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system
>
> On Mon, 8 Oct 2012, Jaegeuk Kim wrote:
>
> > Date: Mon, 08 Oct 2012 19:52:03 +0900
> > From: Jaegeuk Kim <[email protected]>
> > To: 'Namjae Jeon' <[email protected]>
> > Cc: 'Vyacheslav Dubeyko' <[email protected]>,
> > 'Marco Stornelli' <[email protected]>,
> > 'Jaegeuk Kim' <[email protected]>,
> > 'Al Viro' <[email protected]>, [email protected],
> > [email protected], [email protected],
> > [email protected], [email protected], [email protected],
> > [email protected]
> > Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >
> > > -----Original Message-----
> > > From: Namjae Jeon [mailto:[email protected]]
> > > Sent: Monday, October 08, 2012 7:00 PM
> > > To: Jaegeuk Kim
> > > Cc: Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro; [email protected];
> > > [email protected]; [email protected]; [email protected];
> [email protected];
> > > [email protected]; [email protected]
> > > Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > >
> > > 2012/10/8, Jaegeuk Kim <[email protected]>:
> > > >> -----Original Message-----
> > > >> From: Vyacheslav Dubeyko [mailto:[email protected]]
> > > >> Sent: Sunday, October 07, 2012 9:09 PM
> > > >> To: Jaegeuk Kim
> > > >> Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected];
> > > >> [email protected]; linux-
> > > >> [email protected]; [email protected]; [email protected];
> > > >> [email protected];
> > > >> [email protected]
> > > >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > > >>
> > > >> Hi,
> > > >>
> > > >> On Oct 7, 2012, at 1:31 PM, Jaegeuk Kim wrote:
> > > >>
> > > >> >> -----Original Message-----
> > > >> >> From: Marco Stornelli [mailto:[email protected]]
> > > >> >> Sent: Sunday, October 07, 2012 4:10 PM
> > > >> >> To: Jaegeuk Kim
> > > >> >> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro;
> > > >> >> [email protected]; [email protected];
> > > >> >> [email protected]; [email protected];
> > > >> >> [email protected];
> > > >> [email protected];
> > > >> >> [email protected]
> > > >> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > > >> >>
> > > >> >> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
> > > >> >>> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
> > > >> >>>> Hi Jaegeuk,
> > > >> >>>
> > > >> >>> Hi.
> > > >> >>> We know each other, right? :)
> > > >> >>>
> > > >> >>>>
> > > >> >>>>> From: 김재극 <[email protected]>
> > > >> >>>>> To: [email protected], 'Theodore Ts'o' <[email protected]>,
> > > >> >> [email protected], [email protected],
> > > >> >> [email protected],
> > > >> [email protected],
> > > >> >> [email protected], [email protected]
> > > >> >>>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > > >> >>>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
> > > >> >>>>>
> > > >> >>>>> This is a new patch set for the f2fs file system.
> > > >> >>>>>
> > > >> >>>>> What is F2FS?
> > > >> >>>>> =============
> > > >> >>>>>
> > > >> >>>>> NAND flash memory-based storage devices, such as SSD, eMMC, and SD
> > > >> >>>>> cards, have
> > > >> >>>>> been widely being used for ranging from mobile to server systems.
> > > >> >>>>> Since they are
> > > >> >>>>> known to have different characteristics from the conventional
> > > >> >>>>> rotational disks,
> > > >> >>>>> a file system, an upper layer to the storage device, should adapt to
> > > >> >>>>> the changes
> > > >> >>>>> from the sketch.
> > > >> >>>>>
> > > >> >>>>> F2FS is a new file system carefully designed for the NAND flash
> > > >> >>>>> memory-based storage
> > > >> >>>>> devices. We chose a log structure file system approach, but we tried
> > > >> >>>>> to adapt it
> > > >> >>>>> to the new form of storage. Also we remedy some known issues of the
> > > >> >>>>> very old log
> > > >> >>>>> structured file system, such as snowball effect of wandering tree
> > > >> >>>>> and high cleaning
> > > >> >>>>> overhead.
> > > >> >>>>>
> > > >> >>>>> Because a NAND-based storage device shows different characteristics
> > > >> >>>>> according to
> > > >> >>>>> its internal geometry or flash memory management scheme aka FTL, we
> > > >> >>>>> add various
> > > >> >>>>> parameters not only for configuring on-disk layout, but also for
> > > >> >>>>> selecting allocation
> > > >> >>>>> and cleaning algorithms.
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>> What about F2FS performance? Could you share benchmarking results of
> > > >> >>>> the new file system?
> > > >> >>>>
> > > >> >>>> It is very interesting the case of aged file system. How is GC's
> > > >> >>>> implementation efficient? Could
> > > >> >> you share benchmarking results for the very aged file system state?
> > > >> >>>>
> > > >> >>>
> > > >> >>> Although I have benchmark results, currently I'd like to see the
> > > >> >>> results
> > > >> >>> measured by community as a black-box. As you know, the results are
> > > >> >>> very
> > > >> >>> dependent on the workloads and parameters, so I think it would be
> > > >> >>> better
> > > >> >>> to see other results for a while.
> > > >> >>> Thanks,
> > > >> >>>
> > > >> >>
> > > >> >> 1) Actually it's a strange approach. If you have got any results you
> > > >> >> should share them with the community explaining how (the workload, hw
> > > >> >> and so on) your benchmark works and the specific condition. I really
> > > >> >> don't like the approach "I've got the results but I don't say
> > > >> >> anything,
> > > >> >> if you want a number, do it yourself".
> > > >> >
> > > >> > It's definitely right, and I meant *for a while*.
> > > >> > I just wanted to avoid arguing with how to age file system in this
> > > >> > time.
> > > >> > Before then, I share the primitive results as follows.
> > > >> >
> > > >> > 1. iozone in Panda board
> > > >> > - ARM A9
> > > >> > - DRAM : 1GB
> > > >> > - Kernel: Linux 3.3
> > > >> > - Partition: 12GB (64GB Samsung eMMC)
> > > >> > - Tested on 2GB file
> > > >> >
> > > >> > seq. read, seq. write, rand. read, rand. write
> > > >> > - ext4: 30.753 17.066 5.06 4.15
> > > >> > - f2fs: 30.71 16.906 5.073 15.204
> > > >> >
> > > >> > 2. iozone in Galaxy Nexus
> > > >> > - DRAM : 1GB
> > > >> > - Android 4.0.4_r1.2
> > > >> > - Kernel omap 3.0.8
> > > >> > - Partition: /data, 12GB
> > > >> > - Tested on 2GB file
> > > >> >
> > > >> > seq. read, seq. write, rand. read, rand. write
> > > >> > - ext4: 29.88 12.83 11.43 0.56
> > > >> > - f2fs: 29.70 13.34 10.79 12.82
> > > >> >
> > > >>
> > > >>
> > > >> This is results for non-aged filesystem state. Am I correct?
> > > >>
> > > >
> > > > Yes, right.
> > > >
> > > >>
> > > >> > Due to the company secret, I expect to show other results after
> > > >> > presenting f2fs at korea linux forum.
> > > >> >
> > > >> >> 2) For a new filesystem you should send the patches to linux-fsdevel.
> > > >> >
> > > >> > Yes, that was totally my mistake.
> > > >> >
> > > >> >> 3) It's not clear the pros/cons of your filesystem, can you share with
> > > >> >> us the main differences with the current fs already in mainline? Or is
> > > >> >> it a company secret?
> > > >> >
> > > >> > After forum, I can share the slides, and I hope they will be useful to
> > > >> > you.
> > > >> >
> > > >> > Instead, let me summarize at a glance compared with other file systems.
> > > >> > Here are several log-structured file systems.
> > > >> > Note that, F2FS operates on top of block device with consideration on
> > > >> > the FTL behavior.
> > > >> > So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are designed
> > > >> > for raw NAND flash.
> > > >> > LogFS is initially designed for raw NAND flash, but expanded to block
> > > >> > device.
> > > >> > But, I don't know whether it is stable or not.
> > > >> > NILFS2 is one of major log-structured file systems, which supports
> > > >> > multiple snap-shots.
> > > >> > IMO, that feature is quite promising and important to users, but it may
> > > >> > degrade the performance.
> > > >> > There is a trade-off between functionalities and performance.
> > > >> > F2FS chose high performance without any further fancy functionalities.
> > > >> >
> > > >>
> > > >> Performance is a good goal. But fault-tolerance is also very important
> > > >> point. Filesystems are used by
> > > >> users, so, it is very important to guarantee reliability of data keeping.
> > > >> Degradation of performance
> > > >> by means of snapshots is arguable point. Snapshots can solve the problem
> > > >> not only some unpredictable
> > > >> environmental issues but also user's erroneous behavior.
> > > >>
> > > >
> > > > Yes, I agree. I concerned the multiple snapshot feature.
> > > > Of course, fault-tolerance is very important, and file system should support
> > > > it as you know as power-off-recovery.
> > > > f2fs supports the recovery mechanism by adopting checkpoint similar to
> > > > snapshot.
> > > > But, f2fs does not support multiple snapshots for user convenience.
> > > > I just focused on the performance, and absolutely, the multiple snapshot
> > > > feature is also a good alternative approach.
> > > > That may be a trade-off.
> > > >
> > > >> As I understand, it is not possible to have a perfect performance in all
> > > >> possible workloads. Could you
> > > >> point out what workloads are the best way of F2FS using?
> > > >
> > > > Basically I think the following workloads will be good for F2FS.
> > > > - Many random writes : it's LFS nature
> > > > - Small writes with frequent fsync : f2fs is optimized to reduce the fsync
> > > > overhead.
> > > >
> > > >>
> > > >> > Maybe or obviously it is possible to optimize ext4 or btrfs to flash
> > > >> > storages.
> > > >> > IMHO, however, they are originally designed for HDDs, so that it may or
> > > >> > may not suffer from
> > > >> fundamental designs.
> > > >> > I don't know, but why not designing a new file system for flash storages
> > > >> > as a counterpart?
> > > >> >
> > > >>
> > > >> Yes, it is possible. But F2FS is not flash oriented filesystem as JFFS2,
> > > >> YAFFS2, UBIFS but block-
> > > >> oriented filesystem. So, F2FS design is restricted by block-layer's
> > > >> opportunities in the using of
> > > >> flash storages' peculiarities. Could you point out key points of F2FS
> > > >> design that makes this design
> > > >> fundamentally unique?
> > > >
> > > > As you can see the f2fs kernel document patch, I think one of the most
> > > > important features is to align operating units between f2fs and ftl.
> > > > Specifically, f2fs has section and zone, which are cleaning unit and basic
> > > > allocation unit respectively.
> > > > Through these configurable units in f2fs, I think f2fs is able to reduce the
> > > > unnecessary operations done by FTL.
> > > > And, in order to avoid changing IO patterns by the block-layer, f2fs merges
> > > > itself some bios likewise ext4.
> > > Hello.
> > > The internal of eMMC and SSD is the blackbox from user side.
> > > How does the normal user easily set operating units alignment(page
> > > size and physical block size ?) between f2fs and ftl in storage device
> > > ?
> >
> > I've known that some works have been tried to figure out the units by profiling the storage, AKA
> reverse engineering.
> > In most cases, the simplest way is to measure the latencies of consecutive writes and analyze their
> patterns.
> > As you mentioned, in practical, users will not want to do this, so maybe we need a tool to profile
> them to optimize f2fs.
> > In the current state, I think profiling is an another issue, and mkfs.f2fs had better include this
> work in the future.
> > But, IMO, from the viewpoint of performance, default configuration is quite enough now.
> >
> > ps) f2fs doesn't care about the flash page size, but considers garbage collection unit.
>
> I am sorry but this reply makes me smile. How can you design a fs
> relying on time attack heuristics to figure out what the proper
> layout should be ? Or even endorse such heuristics to be used in
> mkfs ? What we should be focusing on is to push vendors to actually
> give us such information so we can properly propagate that
> throughout the kernel - that's something everyone will benefit from.
> After that the optimization can be done in every file system.
>

Frankly speaking, I agree that it would be the right direction eventually.
But, as you know, it's very difficult for all flash vendors to promote and standardize that.
Because each vendors have different strategies to open their internal information and also try
to protect their secrets whatever they are.

IMO, we don't need to wait them now.
Instead, from the start, I suggest f2fs that uses those information to the file system design.
In addition, I suggest using heuristics right now as best efforts.
Maybe in future, if vendors give something, f2fs would be more feasible.
In the mean time, I strongly hope to validate and stabilize f2fs with community.

> Promoting time attack heuristics instead of pushing vendors to tell
> us how their hardware should be used is a journey to hell and we've
> been talking about this for a looong time now. And I imagine that
> you especially have quite some persuasion power.

I know. :)
If there comes a chance, I want to try.
Thanks,

>
> Thanks!
> -Lukas
>
> >
> > >
> > > Thanks.
> > >
> > > >
> > > >>
> > > >> With the best regards,
> > > >> Vyacheslav Dubeyko.
> > > >>
> > > >>
> > > >> >>
> > > >> >> Marco
> > > >> >
> > > >> > ---
> > > >> > Jaegeuk Kim
> > > >> > Samsung
> > > >> >
> > > >> > --
> > > >> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> > > >> > in
> > > >> > the body of a message to [email protected]
> > > >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > >> > Please read the FAQ at http://www.tux.org/lkml/
> > > >
> > > >
> > > > ---
> > > > Jaegeuk Kim
> > > > Samsung
> > > >
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > > > the body of a message to [email protected]
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > >
> >
> >
> > ---
> > Jaegeuk Kim
> > Samsung
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >



---
Jaegeuk Kim
Samsung

2012-10-09 11:01:45

by Lukas Czerner

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

On Tue, 9 Oct 2012, Jaegeuk Kim wrote:

> Date: Tue, 09 Oct 2012 19:45:57 +0900
> From: Jaegeuk Kim <[email protected]>
> To: 'Lukáš Czerner' <[email protected]>
> Cc: 'Namjae Jeon' <[email protected]>,
> 'Vyacheslav Dubeyko' <[email protected]>,
> 'Marco Stornelli' <[email protected]>,
> 'Jaegeuk Kim' <[email protected]>,
> 'Al Viro' <[email protected]>, [email protected],
> [email protected], [email protected],
> [email protected], [email protected], [email protected],
> [email protected]
> Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system
>
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]] On Behalf Of
> > Luka? Czerner
> > Sent: Tuesday, October 09, 2012 5:32 PM
> > To: Jaegeuk Kim
> > Cc: 'Namjae Jeon'; 'Vyacheslav Dubeyko'; 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected];
> > [email protected]; [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]
> > Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >
> > On Mon, 8 Oct 2012, Jaegeuk Kim wrote:
> >
> > > Date: Mon, 08 Oct 2012 19:52:03 +0900
> > > From: Jaegeuk Kim <[email protected]>
> > > To: 'Namjae Jeon' <[email protected]>
> > > Cc: 'Vyacheslav Dubeyko' <[email protected]>,
> > > 'Marco Stornelli' <[email protected]>,
> > > 'Jaegeuk Kim' <[email protected]>,
> > > 'Al Viro' <[email protected]>, [email protected],
> > > [email protected], [email protected],
> > > [email protected], [email protected], [email protected],
> > > [email protected]
> > > Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > >
> > > > -----Original Message-----
> > > > From: Namjae Jeon [mailto:[email protected]]
> > > > Sent: Monday, October 08, 2012 7:00 PM
> > > > To: Jaegeuk Kim
> > > > Cc: Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro; [email protected];
> > > > [email protected]; [email protected]; [email protected];
> > [email protected];
> > > > [email protected]; [email protected]
> > > > Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > > >
> > > > 2012/10/8, Jaegeuk Kim <[email protected]>:
> > > > >> -----Original Message-----
> > > > >> From: Vyacheslav Dubeyko [mailto:[email protected]]
> > > > >> Sent: Sunday, October 07, 2012 9:09 PM
> > > > >> To: Jaegeuk Kim
> > > > >> Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected];
> > > > >> [email protected]; linux-
> > > > >> [email protected]; [email protected]; [email protected];
> > > > >> [email protected];
> > > > >> [email protected]
> > > > >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > > > >>
> > > > >> Hi,
> > > > >>
> > > > >> On Oct 7, 2012, at 1:31 PM, Jaegeuk Kim wrote:
> > > > >>
> > > > >> >> -----Original Message-----
> > > > >> >> From: Marco Stornelli [mailto:[email protected]]
> > > > >> >> Sent: Sunday, October 07, 2012 4:10 PM
> > > > >> >> To: Jaegeuk Kim
> > > > >> >> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro;
> > > > >> >> [email protected]; [email protected];
> > > > >> >> [email protected]; [email protected];
> > > > >> >> [email protected];
> > > > >> [email protected];
> > > > >> >> [email protected]
> > > > >> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > > > >> >>
> > > > >> >> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
> > > > >> >>> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
> > > > >> >>>> Hi Jaegeuk,
> > > > >> >>>
> > > > >> >>> Hi.
> > > > >> >>> We know each other, right? :)
> > > > >> >>>
> > > > >> >>>>
> > > > >> >>>>> From: 김재극 <[email protected]>
> > > > >> >>>>> To: [email protected], 'Theodore Ts'o' <[email protected]>,
> > > > >> >> [email protected], [email protected],
> > > > >> >> [email protected],
> > > > >> [email protected],
> > > > >> >> [email protected], [email protected]
> > > > >> >>>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > > > >> >>>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
> > > > >> >>>>>
> > > > >> >>>>> This is a new patch set for the f2fs file system.
> > > > >> >>>>>
> > > > >> >>>>> What is F2FS?
> > > > >> >>>>> =============
> > > > >> >>>>>
> > > > >> >>>>> NAND flash memory-based storage devices, such as SSD, eMMC, and SD
> > > > >> >>>>> cards, have
> > > > >> >>>>> been widely being used for ranging from mobile to server systems.
> > > > >> >>>>> Since they are
> > > > >> >>>>> known to have different characteristics from the conventional
> > > > >> >>>>> rotational disks,
> > > > >> >>>>> a file system, an upper layer to the storage device, should adapt to
> > > > >> >>>>> the changes
> > > > >> >>>>> from the sketch.
> > > > >> >>>>>
> > > > >> >>>>> F2FS is a new file system carefully designed for the NAND flash
> > > > >> >>>>> memory-based storage
> > > > >> >>>>> devices. We chose a log structure file system approach, but we tried
> > > > >> >>>>> to adapt it
> > > > >> >>>>> to the new form of storage. Also we remedy some known issues of the
> > > > >> >>>>> very old log
> > > > >> >>>>> structured file system, such as snowball effect of wandering tree
> > > > >> >>>>> and high cleaning
> > > > >> >>>>> overhead.
> > > > >> >>>>>
> > > > >> >>>>> Because a NAND-based storage device shows different characteristics
> > > > >> >>>>> according to
> > > > >> >>>>> its internal geometry or flash memory management scheme aka FTL, we
> > > > >> >>>>> add various
> > > > >> >>>>> parameters not only for configuring on-disk layout, but also for
> > > > >> >>>>> selecting allocation
> > > > >> >>>>> and cleaning algorithms.
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>> What about F2FS performance? Could you share benchmarking results of
> > > > >> >>>> the new file system?
> > > > >> >>>>
> > > > >> >>>> It is very interesting the case of aged file system. How is GC's
> > > > >> >>>> implementation efficient? Could
> > > > >> >> you share benchmarking results for the very aged file system state?
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>> Although I have benchmark results, currently I'd like to see the
> > > > >> >>> results
> > > > >> >>> measured by community as a black-box. As you know, the results are
> > > > >> >>> very
> > > > >> >>> dependent on the workloads and parameters, so I think it would be
> > > > >> >>> better
> > > > >> >>> to see other results for a while.
> > > > >> >>> Thanks,
> > > > >> >>>
> > > > >> >>
> > > > >> >> 1) Actually it's a strange approach. If you have got any results you
> > > > >> >> should share them with the community explaining how (the workload, hw
> > > > >> >> and so on) your benchmark works and the specific condition. I really
> > > > >> >> don't like the approach "I've got the results but I don't say
> > > > >> >> anything,
> > > > >> >> if you want a number, do it yourself".
> > > > >> >
> > > > >> > It's definitely right, and I meant *for a while*.
> > > > >> > I just wanted to avoid arguing with how to age file system in this
> > > > >> > time.
> > > > >> > Before then, I share the primitive results as follows.
> > > > >> >
> > > > >> > 1. iozone in Panda board
> > > > >> > - ARM A9
> > > > >> > - DRAM : 1GB
> > > > >> > - Kernel: Linux 3.3
> > > > >> > - Partition: 12GB (64GB Samsung eMMC)
> > > > >> > - Tested on 2GB file
> > > > >> >
> > > > >> > seq. read, seq. write, rand. read, rand. write
> > > > >> > - ext4: 30.753 17.066 5.06 4.15
> > > > >> > - f2fs: 30.71 16.906 5.073 15.204
> > > > >> >
> > > > >> > 2. iozone in Galaxy Nexus
> > > > >> > - DRAM : 1GB
> > > > >> > - Android 4.0.4_r1.2
> > > > >> > - Kernel omap 3.0.8
> > > > >> > - Partition: /data, 12GB
> > > > >> > - Tested on 2GB file
> > > > >> >
> > > > >> > seq. read, seq. write, rand. read, rand. write
> > > > >> > - ext4: 29.88 12.83 11.43 0.56
> > > > >> > - f2fs: 29.70 13.34 10.79 12.82
> > > > >> >
> > > > >>
> > > > >>
> > > > >> This is results for non-aged filesystem state. Am I correct?
> > > > >>
> > > > >
> > > > > Yes, right.
> > > > >
> > > > >>
> > > > >> > Due to the company secret, I expect to show other results after
> > > > >> > presenting f2fs at korea linux forum.
> > > > >> >
> > > > >> >> 2) For a new filesystem you should send the patches to linux-fsdevel.
> > > > >> >
> > > > >> > Yes, that was totally my mistake.
> > > > >> >
> > > > >> >> 3) It's not clear the pros/cons of your filesystem, can you share with
> > > > >> >> us the main differences with the current fs already in mainline? Or is
> > > > >> >> it a company secret?
> > > > >> >
> > > > >> > After forum, I can share the slides, and I hope they will be useful to
> > > > >> > you.
> > > > >> >
> > > > >> > Instead, let me summarize at a glance compared with other file systems.
> > > > >> > Here are several log-structured file systems.
> > > > >> > Note that, F2FS operates on top of block device with consideration on
> > > > >> > the FTL behavior.
> > > > >> > So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are designed
> > > > >> > for raw NAND flash.
> > > > >> > LogFS is initially designed for raw NAND flash, but expanded to block
> > > > >> > device.
> > > > >> > But, I don't know whether it is stable or not.
> > > > >> > NILFS2 is one of major log-structured file systems, which supports
> > > > >> > multiple snap-shots.
> > > > >> > IMO, that feature is quite promising and important to users, but it may
> > > > >> > degrade the performance.
> > > > >> > There is a trade-off between functionalities and performance.
> > > > >> > F2FS chose high performance without any further fancy functionalities.
> > > > >> >
> > > > >>
> > > > >> Performance is a good goal. But fault-tolerance is also very important
> > > > >> point. Filesystems are used by
> > > > >> users, so, it is very important to guarantee reliability of data keeping.
> > > > >> Degradation of performance
> > > > >> by means of snapshots is arguable point. Snapshots can solve the problem
> > > > >> not only some unpredictable
> > > > >> environmental issues but also user's erroneous behavior.
> > > > >>
> > > > >
> > > > > Yes, I agree. I concerned the multiple snapshot feature.
> > > > > Of course, fault-tolerance is very important, and file system should support
> > > > > it as you know as power-off-recovery.
> > > > > f2fs supports the recovery mechanism by adopting checkpoint similar to
> > > > > snapshot.
> > > > > But, f2fs does not support multiple snapshots for user convenience.
> > > > > I just focused on the performance, and absolutely, the multiple snapshot
> > > > > feature is also a good alternative approach.
> > > > > That may be a trade-off.
> > > > >
> > > > >> As I understand, it is not possible to have a perfect performance in all
> > > > >> possible workloads. Could you
> > > > >> point out what workloads are the best way of F2FS using?
> > > > >
> > > > > Basically I think the following workloads will be good for F2FS.
> > > > > - Many random writes : it's LFS nature
> > > > > - Small writes with frequent fsync : f2fs is optimized to reduce the fsync
> > > > > overhead.
> > > > >
> > > > >>
> > > > >> > Maybe or obviously it is possible to optimize ext4 or btrfs to flash
> > > > >> > storages.
> > > > >> > IMHO, however, they are originally designed for HDDs, so that it may or
> > > > >> > may not suffer from
> > > > >> fundamental designs.
> > > > >> > I don't know, but why not designing a new file system for flash storages
> > > > >> > as a counterpart?
> > > > >> >
> > > > >>
> > > > >> Yes, it is possible. But F2FS is not flash oriented filesystem as JFFS2,
> > > > >> YAFFS2, UBIFS but block-
> > > > >> oriented filesystem. So, F2FS design is restricted by block-layer's
> > > > >> opportunities in the using of
> > > > >> flash storages' peculiarities. Could you point out key points of F2FS
> > > > >> design that makes this design
> > > > >> fundamentally unique?
> > > > >
> > > > > As you can see the f2fs kernel document patch, I think one of the most
> > > > > important features is to align operating units between f2fs and ftl.
> > > > > Specifically, f2fs has section and zone, which are cleaning unit and basic
> > > > > allocation unit respectively.
> > > > > Through these configurable units in f2fs, I think f2fs is able to reduce the
> > > > > unnecessary operations done by FTL.
> > > > > And, in order to avoid changing IO patterns by the block-layer, f2fs merges
> > > > > itself some bios likewise ext4.
> > > > Hello.
> > > > The internal of eMMC and SSD is the blackbox from user side.
> > > > How does the normal user easily set operating units alignment(page
> > > > size and physical block size ?) between f2fs and ftl in storage device
> > > > ?
> > >
> > > I've known that some works have been tried to figure out the units by profiling the storage, AKA
> > reverse engineering.
> > > In most cases, the simplest way is to measure the latencies of consecutive writes and analyze their
> > patterns.
> > > As you mentioned, in practical, users will not want to do this, so maybe we need a tool to profile
> > them to optimize f2fs.
> > > In the current state, I think profiling is an another issue, and mkfs.f2fs had better include this
> > work in the future.
> > > But, IMO, from the viewpoint of performance, default configuration is quite enough now.
> > >
> > > ps) f2fs doesn't care about the flash page size, but considers garbage collection unit.
> >
> > I am sorry but this reply makes me smile. How can you design a fs
> > relying on time attack heuristics to figure out what the proper
> > layout should be ? Or even endorse such heuristics to be used in
> > mkfs ? What we should be focusing on is to push vendors to actually
> > give us such information so we can properly propagate that
> > throughout the kernel - that's something everyone will benefit from.
> > After that the optimization can be done in every file system.
> >
>
> Frankly speaking, I agree that it would be the right direction eventually.
> But, as you know, it's very difficult for all flash vendors to promote and standardize that.
> Because each vendors have different strategies to open their internal information and also try
> to protect their secrets whatever they are.
>
> IMO, we don't need to wait them now.
> Instead, from the start, I suggest f2fs that uses those information to the file system design.
> In addition, I suggest using heuristics right now as best efforts.
> Maybe in future, if vendors give something, f2fs would be more feasible.
> In the mean time, I strongly hope to validate and stabilize f2fs with community.

Do not get me wrong, I do not think it is worth to wait for vendors
to come to their senses, but it is worth constantly reminding that
we *need* this kind of information and those heuristics are not
feasible in the long run anyway.

I believe that this conversation happened several times already, but
what about having independent public database of all the internal
information about hw from different vendors where users can add
information gathered by the time attack heuristic so other does not
have to run this again and again. I am not sure if Linaro or someone
else have something like that, someone can maybe post a link to that.

Eventually we can show this to the vendors to see that their
"secrets" are already public anyway and that everyones lives would be
easier if they just agree to provide it from the beginning.

>
> > Promoting time attack heuristics instead of pushing vendors to tell
> > us how their hardware should be used is a journey to hell and we've
> > been talking about this for a looong time now. And I imagine that
> > you especially have quite some persuasion power.
>
> I know. :)
> If there comes a chance, I want to try.
> Thanks,

That's very good to hear, thank you.

-Lukas

>
> >
> > Thanks!
> > -Lukas
> >
> > >
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > >>
> > > > >> With the best regards,
> > > > >> Vyacheslav Dubeyko.
> > > > >>
> > > > >>
> > > > >> >>
> > > > >> >> Marco
> > > > >> >
> > > > >> > ---
> > > > >> > Jaegeuk Kim
> > > > >> > Samsung
> > > > >> >
> > > > >> > --
> > > > >> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> > > > >> > in
> > > > >> > the body of a message to [email protected]
> > > > >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > > >> > Please read the FAQ at http://www.tux.org/lkml/
> > > > >
> > > > >
> > > > > ---
> > > > > Jaegeuk Kim
> > > > > Samsung
> > > > >
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > > > > the body of a message to [email protected]
> > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > > >
> > >
> > >
> > > ---
> > > Jaegeuk Kim
> > > Samsung
> > >
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
>
>
>
> ---
> Jaegeuk Kim
> Samsung
>
>
>

2012-10-09 12:01:24

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

> -----Original Message-----
> From: Lukáš Czerner [mailto:[email protected]]
> Sent: Tuesday, October 09, 2012 8:01 PM
> To: Jaegeuk Kim
> Cc: 'Lukáš Czerner'; 'Namjae Jeon'; 'Vyacheslav Dubeyko'; 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro';
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system
>
> On Tue, 9 Oct 2012, Jaegeuk Kim wrote:
>
> > Date: Tue, 09 Oct 2012 19:45:57 +0900
> > From: Jaegeuk Kim <[email protected]>
> > To: 'Lukáš Czerner' <[email protected]>
> > Cc: 'Namjae Jeon' <[email protected]>,
> > 'Vyacheslav Dubeyko' <[email protected]>,
> > 'Marco Stornelli' <[email protected]>,
> > 'Jaegeuk Kim' <[email protected]>,
> > 'Al Viro' <[email protected]>, [email protected],
> > [email protected], [email protected],
> > [email protected], [email protected], [email protected],
> > [email protected]
> > Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >
> > > -----Original Message-----
> > > From: [email protected] [mailto:[email protected]] On Behalf
> Of
> > > Luka? Czerner
> > > Sent: Tuesday, October 09, 2012 5:32 PM
> > > To: Jaegeuk Kim
> > > Cc: 'Namjae Jeon'; 'Vyacheslav Dubeyko'; 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro';
> [email protected];
> > > [email protected]; [email protected]; [email protected];
> [email protected];
> > > [email protected]; [email protected]
> > > Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > >
> > > On Mon, 8 Oct 2012, Jaegeuk Kim wrote:
> > >
> > > > Date: Mon, 08 Oct 2012 19:52:03 +0900
> > > > From: Jaegeuk Kim <[email protected]>
> > > > To: 'Namjae Jeon' <[email protected]>
> > > > Cc: 'Vyacheslav Dubeyko' <[email protected]>,
> > > > 'Marco Stornelli' <[email protected]>,
> > > > 'Jaegeuk Kim' <[email protected]>,
> > > > 'Al Viro' <[email protected]>, [email protected],
> > > > [email protected], [email protected],
> > > > [email protected], [email protected], [email protected],
> > > > [email protected]
> > > > Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > > >
> > > > > -----Original Message-----
> > > > > From: Namjae Jeon [mailto:[email protected]]
> > > > > Sent: Monday, October 08, 2012 7:00 PM
> > > > > To: Jaegeuk Kim
> > > > > Cc: Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro; [email protected];
> > > > > [email protected]; [email protected]; [email protected];
> > > [email protected];
> > > > > [email protected]; [email protected]
> > > > > Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > > > >
> > > > > 2012/10/8, Jaegeuk Kim <[email protected]>:
> > > > > >> -----Original Message-----
> > > > > >> From: Vyacheslav Dubeyko [mailto:[email protected]]
> > > > > >> Sent: Sunday, October 07, 2012 9:09 PM
> > > > > >> To: Jaegeuk Kim
> > > > > >> Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected];
> > > > > >> [email protected]; linux-
> > > > > >> [email protected]; [email protected]; [email protected];
> > > > > >> [email protected];
> > > > > >> [email protected]
> > > > > >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > > > > >>
> > > > > >> Hi,
> > > > > >>
> > > > > >> On Oct 7, 2012, at 1:31 PM, Jaegeuk Kim wrote:
> > > > > >>
> > > > > >> >> -----Original Message-----
> > > > > >> >> From: Marco Stornelli [mailto:[email protected]]
> > > > > >> >> Sent: Sunday, October 07, 2012 4:10 PM
> > > > > >> >> To: Jaegeuk Kim
> > > > > >> >> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro;
> > > > > >> >> [email protected]; [email protected];
> > > > > >> >> [email protected]; [email protected];
> > > > > >> >> [email protected];
> > > > > >> [email protected];
> > > > > >> >> [email protected]
> > > > > >> >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > > > > >> >>
> > > > > >> >> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
> > > > > >> >>> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
> > > > > >> >>>> Hi Jaegeuk,
> > > > > >> >>>
> > > > > >> >>> Hi.
> > > > > >> >>> We know each other, right? :)
> > > > > >> >>>
> > > > > >> >>>>
> > > > > >> >>>>> From: 김재극 <[email protected]>
> > > > > >> >>>>> To: [email protected], 'Theodore Ts'o' <[email protected]>,
> > > > > >> >> [email protected], [email protected],
> > > > > >> >> [email protected],
> > > > > >> [email protected],
> > > > > >> >> [email protected], [email protected]
> > > > > >> >>>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > > > > >> >>>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
> > > > > >> >>>>>
> > > > > >> >>>>> This is a new patch set for the f2fs file system.
> > > > > >> >>>>>
> > > > > >> >>>>> What is F2FS?
> > > > > >> >>>>> =============
> > > > > >> >>>>>
> > > > > >> >>>>> NAND flash memory-based storage devices, such as SSD, eMMC, and SD
> > > > > >> >>>>> cards, have
> > > > > >> >>>>> been widely being used for ranging from mobile to server systems.
> > > > > >> >>>>> Since they are
> > > > > >> >>>>> known to have different characteristics from the conventional
> > > > > >> >>>>> rotational disks,
> > > > > >> >>>>> a file system, an upper layer to the storage device, should adapt to
> > > > > >> >>>>> the changes
> > > > > >> >>>>> from the sketch.
> > > > > >> >>>>>
> > > > > >> >>>>> F2FS is a new file system carefully designed for the NAND flash
> > > > > >> >>>>> memory-based storage
> > > > > >> >>>>> devices. We chose a log structure file system approach, but we tried
> > > > > >> >>>>> to adapt it
> > > > > >> >>>>> to the new form of storage. Also we remedy some known issues of the
> > > > > >> >>>>> very old log
> > > > > >> >>>>> structured file system, such as snowball effect of wandering tree
> > > > > >> >>>>> and high cleaning
> > > > > >> >>>>> overhead.
> > > > > >> >>>>>
> > > > > >> >>>>> Because a NAND-based storage device shows different characteristics
> > > > > >> >>>>> according to
> > > > > >> >>>>> its internal geometry or flash memory management scheme aka FTL, we
> > > > > >> >>>>> add various
> > > > > >> >>>>> parameters not only for configuring on-disk layout, but also for
> > > > > >> >>>>> selecting allocation
> > > > > >> >>>>> and cleaning algorithms.
> > > > > >> >>>>>
> > > > > >> >>>>
> > > > > >> >>>> What about F2FS performance? Could you share benchmarking results of
> > > > > >> >>>> the new file system?
> > > > > >> >>>>
> > > > > >> >>>> It is very interesting the case of aged file system. How is GC's
> > > > > >> >>>> implementation efficient? Could
> > > > > >> >> you share benchmarking results for the very aged file system state?
> > > > > >> >>>>
> > > > > >> >>>
> > > > > >> >>> Although I have benchmark results, currently I'd like to see the
> > > > > >> >>> results
> > > > > >> >>> measured by community as a black-box. As you know, the results are
> > > > > >> >>> very
> > > > > >> >>> dependent on the workloads and parameters, so I think it would be
> > > > > >> >>> better
> > > > > >> >>> to see other results for a while.
> > > > > >> >>> Thanks,
> > > > > >> >>>
> > > > > >> >>
> > > > > >> >> 1) Actually it's a strange approach. If you have got any results you
> > > > > >> >> should share them with the community explaining how (the workload, hw
> > > > > >> >> and so on) your benchmark works and the specific condition. I really
> > > > > >> >> don't like the approach "I've got the results but I don't say
> > > > > >> >> anything,
> > > > > >> >> if you want a number, do it yourself".
> > > > > >> >
> > > > > >> > It's definitely right, and I meant *for a while*.
> > > > > >> > I just wanted to avoid arguing with how to age file system in this
> > > > > >> > time.
> > > > > >> > Before then, I share the primitive results as follows.
> > > > > >> >
> > > > > >> > 1. iozone in Panda board
> > > > > >> > - ARM A9
> > > > > >> > - DRAM : 1GB
> > > > > >> > - Kernel: Linux 3.3
> > > > > >> > - Partition: 12GB (64GB Samsung eMMC)
> > > > > >> > - Tested on 2GB file
> > > > > >> >
> > > > > >> > seq. read, seq. write, rand. read, rand. write
> > > > > >> > - ext4: 30.753 17.066 5.06 4.15
> > > > > >> > - f2fs: 30.71 16.906 5.073 15.204
> > > > > >> >
> > > > > >> > 2. iozone in Galaxy Nexus
> > > > > >> > - DRAM : 1GB
> > > > > >> > - Android 4.0.4_r1.2
> > > > > >> > - Kernel omap 3.0.8
> > > > > >> > - Partition: /data, 12GB
> > > > > >> > - Tested on 2GB file
> > > > > >> >
> > > > > >> > seq. read, seq. write, rand. read, rand. write
> > > > > >> > - ext4: 29.88 12.83 11.43 0.56
> > > > > >> > - f2fs: 29.70 13.34 10.79 12.82
> > > > > >> >
> > > > > >>
> > > > > >>
> > > > > >> This is results for non-aged filesystem state. Am I correct?
> > > > > >>
> > > > > >
> > > > > > Yes, right.
> > > > > >
> > > > > >>
> > > > > >> > Due to the company secret, I expect to show other results after
> > > > > >> > presenting f2fs at korea linux forum.
> > > > > >> >
> > > > > >> >> 2) For a new filesystem you should send the patches to linux-fsdevel.
> > > > > >> >
> > > > > >> > Yes, that was totally my mistake.
> > > > > >> >
> > > > > >> >> 3) It's not clear the pros/cons of your filesystem, can you share with
> > > > > >> >> us the main differences with the current fs already in mainline? Or is
> > > > > >> >> it a company secret?
> > > > > >> >
> > > > > >> > After forum, I can share the slides, and I hope they will be useful to
> > > > > >> > you.
> > > > > >> >
> > > > > >> > Instead, let me summarize at a glance compared with other file systems.
> > > > > >> > Here are several log-structured file systems.
> > > > > >> > Note that, F2FS operates on top of block device with consideration on
> > > > > >> > the FTL behavior.
> > > > > >> > So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are designed
> > > > > >> > for raw NAND flash.
> > > > > >> > LogFS is initially designed for raw NAND flash, but expanded to block
> > > > > >> > device.
> > > > > >> > But, I don't know whether it is stable or not.
> > > > > >> > NILFS2 is one of major log-structured file systems, which supports
> > > > > >> > multiple snap-shots.
> > > > > >> > IMO, that feature is quite promising and important to users, but it may
> > > > > >> > degrade the performance.
> > > > > >> > There is a trade-off between functionalities and performance.
> > > > > >> > F2FS chose high performance without any further fancy functionalities.
> > > > > >> >
> > > > > >>
> > > > > >> Performance is a good goal. But fault-tolerance is also very important
> > > > > >> point. Filesystems are used by
> > > > > >> users, so, it is very important to guarantee reliability of data keeping.
> > > > > >> Degradation of performance
> > > > > >> by means of snapshots is arguable point. Snapshots can solve the problem
> > > > > >> not only some unpredictable
> > > > > >> environmental issues but also user's erroneous behavior.
> > > > > >>
> > > > > >
> > > > > > Yes, I agree. I concerned the multiple snapshot feature.
> > > > > > Of course, fault-tolerance is very important, and file system should support
> > > > > > it as you know as power-off-recovery.
> > > > > > f2fs supports the recovery mechanism by adopting checkpoint similar to
> > > > > > snapshot.
> > > > > > But, f2fs does not support multiple snapshots for user convenience.
> > > > > > I just focused on the performance, and absolutely, the multiple snapshot
> > > > > > feature is also a good alternative approach.
> > > > > > That may be a trade-off.
> > > > > >
> > > > > >> As I understand, it is not possible to have a perfect performance in all
> > > > > >> possible workloads. Could you
> > > > > >> point out what workloads are the best way of F2FS using?
> > > > > >
> > > > > > Basically I think the following workloads will be good for F2FS.
> > > > > > - Many random writes : it's LFS nature
> > > > > > - Small writes with frequent fsync : f2fs is optimized to reduce the fsync
> > > > > > overhead.
> > > > > >
> > > > > >>
> > > > > >> > Maybe or obviously it is possible to optimize ext4 or btrfs to flash
> > > > > >> > storages.
> > > > > >> > IMHO, however, they are originally designed for HDDs, so that it may or
> > > > > >> > may not suffer from
> > > > > >> fundamental designs.
> > > > > >> > I don't know, but why not designing a new file system for flash storages
> > > > > >> > as a counterpart?
> > > > > >> >
> > > > > >>
> > > > > >> Yes, it is possible. But F2FS is not flash oriented filesystem as JFFS2,
> > > > > >> YAFFS2, UBIFS but block-
> > > > > >> oriented filesystem. So, F2FS design is restricted by block-layer's
> > > > > >> opportunities in the using of
> > > > > >> flash storages' peculiarities. Could you point out key points of F2FS
> > > > > >> design that makes this design
> > > > > >> fundamentally unique?
> > > > > >
> > > > > > As you can see the f2fs kernel document patch, I think one of the most
> > > > > > important features is to align operating units between f2fs and ftl.
> > > > > > Specifically, f2fs has section and zone, which are cleaning unit and basic
> > > > > > allocation unit respectively.
> > > > > > Through these configurable units in f2fs, I think f2fs is able to reduce the
> > > > > > unnecessary operations done by FTL.
> > > > > > And, in order to avoid changing IO patterns by the block-layer, f2fs merges
> > > > > > itself some bios likewise ext4.
> > > > > Hello.
> > > > > The internal of eMMC and SSD is the blackbox from user side.
> > > > > How does the normal user easily set operating units alignment(page
> > > > > size and physical block size ?) between f2fs and ftl in storage device
> > > > > ?
> > > >
> > > > I've known that some works have been tried to figure out the units by profiling the storage, AKA
> > > reverse engineering.
> > > > In most cases, the simplest way is to measure the latencies of consecutive writes and analyze
> their
> > > patterns.
> > > > As you mentioned, in practical, users will not want to do this, so maybe we need a tool to
> profile
> > > them to optimize f2fs.
> > > > In the current state, I think profiling is an another issue, and mkfs.f2fs had better include
> this
> > > work in the future.
> > > > But, IMO, from the viewpoint of performance, default configuration is quite enough now.
> > > >
> > > > ps) f2fs doesn't care about the flash page size, but considers garbage collection unit.
> > >
> > > I am sorry but this reply makes me smile. How can you design a fs
> > > relying on time attack heuristics to figure out what the proper
> > > layout should be ? Or even endorse such heuristics to be used in
> > > mkfs ? What we should be focusing on is to push vendors to actually
> > > give us such information so we can properly propagate that
> > > throughout the kernel - that's something everyone will benefit from.
> > > After that the optimization can be done in every file system.
> > >
> >
> > Frankly speaking, I agree that it would be the right direction eventually.
> > But, as you know, it's very difficult for all flash vendors to promote and standardize that.
> > Because each vendors have different strategies to open their internal information and also try
> > to protect their secrets whatever they are.
> >
> > IMO, we don't need to wait them now.
> > Instead, from the start, I suggest f2fs that uses those information to the file system design.
> > In addition, I suggest using heuristics right now as best efforts.
> > Maybe in future, if vendors give something, f2fs would be more feasible.
> > In the mean time, I strongly hope to validate and stabilize f2fs with community.
>
> Do not get me wrong, I do not think it is worth to wait for vendors
> to come to their senses, but it is worth constantly reminding that
> we *need* this kind of information and those heuristics are not
> feasible in the long run anyway.
>
> I believe that this conversation happened several times already, but
> what about having independent public database of all the internal
> information about hw from different vendors where users can add
> information gathered by the time attack heuristic so other does not
> have to run this again and again. I am not sure if Linaro or someone
> else have something like that, someone can maybe post a link to that.
>

As I mentioned, I agree to push vendors to open those information all the time.
And, I absolutely didn't mean that it is worth to wait vendors.
I meant, until opening those information by vendors, something like
proposing f2fs or gathering heuristics are also needed simultaneously.

Anyway, it's very interesting to build a database gathering products' information.
May I access the database?

Thanks,

> Eventually we can show this to the vendors to see that their
> "secrets" are already public anyway and that everyones lives would be
> easier if they just agree to provide it from the beginning.
>
> >
> > > Promoting time attack heuristics instead of pushing vendors to tell
> > > us how their hardware should be used is a journey to hell and we've
> > > been talking about this for a looong time now. And I imagine that
> > > you especially have quite some persuasion power.
> >
> > I know. :)
> > If there comes a chance, I want to try.
> > Thanks,
>
> That's very good to hear, thank you.
>
> -Lukas
>
> >
> > >
> > > Thanks!
> > > -Lukas
> > >
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > > >
> > > > > >>
> > > > > >> With the best regards,
> > > > > >> Vyacheslav Dubeyko.
> > > > > >>
> > > > > >>
> > > > > >> >>
> > > > > >> >> Marco
> > > > > >> >
> > > > > >> > ---
> > > > > >> > Jaegeuk Kim
> > > > > >> > Samsung
> > > > > >> >
> > > > > >> > --
> > > > > >> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> > > > > >> > in
> > > > > >> > the body of a message to [email protected]
> > > > > >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > > > >> > Please read the FAQ at http://www.tux.org/lkml/
> > > > > >
> > > > > >
> > > > > > ---
> > > > > > Jaegeuk Kim
> > > > > > Samsung
> > > > > >
> > > > > > --
> > > > > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > > > > > the body of a message to [email protected]
> > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > > > >
> > > >
> > > >
> > > > ---
> > > > Jaegeuk Kim
> > > > Samsung
> > > >
> > > >
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > > > the body of a message to [email protected]
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > >
> >
> >
> >
> > ---
> > Jaegeuk Kim
> > Samsung
> >
> >
> >

2012-10-09 12:39:44

by Lukas Czerner

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

On Tue, 9 Oct 2012, Jaegeuk Kim wrote:

> > > > > > >
> > > > > > > As you can see the f2fs kernel document patch, I think one of the most
> > > > > > > important features is to align operating units between f2fs and ftl.
> > > > > > > Specifically, f2fs has section and zone, which are cleaning unit and basic
> > > > > > > allocation unit respectively.
> > > > > > > Through these configurable units in f2fs, I think f2fs is able to reduce the
> > > > > > > unnecessary operations done by FTL.
> > > > > > > And, in order to avoid changing IO patterns by the block-layer, f2fs merges
> > > > > > > itself some bios likewise ext4.
> > > > > > Hello.
> > > > > > The internal of eMMC and SSD is the blackbox from user side.
> > > > > > How does the normal user easily set operating units alignment(page
> > > > > > size and physical block size ?) between f2fs and ftl in storage device
> > > > > > ?
> > > > >
> > > > > I've known that some works have been tried to figure out the units by profiling the storage, AKA
> > > > reverse engineering.
> > > > > In most cases, the simplest way is to measure the latencies of consecutive writes and analyze
> > their
> > > > patterns.
> > > > > As you mentioned, in practical, users will not want to do this, so maybe we need a tool to
> > profile
> > > > them to optimize f2fs.
> > > > > In the current state, I think profiling is an another issue, and mkfs.f2fs had better include
> > this
> > > > work in the future.
> > > > > But, IMO, from the viewpoint of performance, default configuration is quite enough now.
> > > > >
> > > > > ps) f2fs doesn't care about the flash page size, but considers garbage collection unit.
> > > >
> > > > I am sorry but this reply makes me smile. How can you design a fs
> > > > relying on time attack heuristics to figure out what the proper
> > > > layout should be ? Or even endorse such heuristics to be used in
> > > > mkfs ? What we should be focusing on is to push vendors to actually
> > > > give us such information so we can properly propagate that
> > > > throughout the kernel - that's something everyone will benefit from.
> > > > After that the optimization can be done in every file system.
> > > >
> > >
> > > Frankly speaking, I agree that it would be the right direction eventually.
> > > But, as you know, it's very difficult for all flash vendors to promote and standardize that.
> > > Because each vendors have different strategies to open their internal information and also try
> > > to protect their secrets whatever they are.
> > >
> > > IMO, we don't need to wait them now.
> > > Instead, from the start, I suggest f2fs that uses those information to the file system design.
> > > In addition, I suggest using heuristics right now as best efforts.
> > > Maybe in future, if vendors give something, f2fs would be more feasible.
> > > In the mean time, I strongly hope to validate and stabilize f2fs with community.
> >
> > Do not get me wrong, I do not think it is worth to wait for vendors
> > to come to their senses, but it is worth constantly reminding that
> > we *need* this kind of information and those heuristics are not
> > feasible in the long run anyway.
> >
> > I believe that this conversation happened several times already, but
> > what about having independent public database of all the internal
> > information about hw from different vendors where users can add
> > information gathered by the time attack heuristic so other does not
> > have to run this again and again. I am not sure if Linaro or someone
> > else have something like that, someone can maybe post a link to that.
> >
>
> As I mentioned, I agree to push vendors to open those information all the time.
> And, I absolutely didn't mean that it is worth to wait vendors.
> I meant, until opening those information by vendors, something like
> proposing f2fs or gathering heuristics are also needed simultaneously.
>
> Anyway, it's very interesting to build a database gathering products' information.
> May I access the database?

That's what I found:

https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey

-Lukas

>
> Thanks,
>
> > Eventually we can show this to the vendors to see that their
> > "secrets" are already public anyway and that everyones lives would be
> > easier if they just agree to provide it from the beginning.
> >
> > >
> > > > Promoting time attack heuristics instead of pushing vendors to tell
> > > > us how their hardware should be used is a journey to hell and we've
> > > > been talking about this for a looong time now. And I imagine that
> > > > you especially have quite some persuasion power.
> > >
> > > I know. :)
> > > If there comes a chance, I want to try.
> > > Thanks,
> >
> > That's very good to hear, thank you.
> >
> > -Lukas

2012-10-09 13:10:23

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

2012-10-09 (화), 14:39 +0200, Lukáš Czerner:
> On Tue, 9 Oct 2012, Jaegeuk Kim wrote:
>
> > > > > > > >
> > > > > > > > As you can see the f2fs kernel document patch, I think one of the most
> > > > > > > > important features is to align operating units between f2fs and ftl.
> > > > > > > > Specifically, f2fs has section and zone, which are cleaning unit and basic
> > > > > > > > allocation unit respectively.
> > > > > > > > Through these configurable units in f2fs, I think f2fs is able to reduce the
> > > > > > > > unnecessary operations done by FTL.
> > > > > > > > And, in order to avoid changing IO patterns by the block-layer, f2fs merges
> > > > > > > > itself some bios likewise ext4.
> > > > > > > Hello.
> > > > > > > The internal of eMMC and SSD is the blackbox from user side.
> > > > > > > How does the normal user easily set operating units alignment(page
> > > > > > > size and physical block size ?) between f2fs and ftl in storage device
> > > > > > > ?
> > > > > >
> > > > > > I've known that some works have been tried to figure out the units by profiling the storage, AKA
> > > > > reverse engineering.
> > > > > > In most cases, the simplest way is to measure the latencies of consecutive writes and analyze
> > > their
> > > > > patterns.
> > > > > > As you mentioned, in practical, users will not want to do this, so maybe we need a tool to
> > > profile
> > > > > them to optimize f2fs.
> > > > > > In the current state, I think profiling is an another issue, and mkfs.f2fs had better include
> > > this
> > > > > work in the future.
> > > > > > But, IMO, from the viewpoint of performance, default configuration is quite enough now.
> > > > > >
> > > > > > ps) f2fs doesn't care about the flash page size, but considers garbage collection unit.
> > > > >
> > > > > I am sorry but this reply makes me smile. How can you design a fs
> > > > > relying on time attack heuristics to figure out what the proper
> > > > > layout should be ? Or even endorse such heuristics to be used in
> > > > > mkfs ? What we should be focusing on is to push vendors to actually
> > > > > give us such information so we can properly propagate that
> > > > > throughout the kernel - that's something everyone will benefit from.
> > > > > After that the optimization can be done in every file system.
> > > > >
> > > >
> > > > Frankly speaking, I agree that it would be the right direction eventually.
> > > > But, as you know, it's very difficult for all flash vendors to promote and standardize that.
> > > > Because each vendors have different strategies to open their internal information and also try
> > > > to protect their secrets whatever they are.
> > > >
> > > > IMO, we don't need to wait them now.
> > > > Instead, from the start, I suggest f2fs that uses those information to the file system design.
> > > > In addition, I suggest using heuristics right now as best efforts.
> > > > Maybe in future, if vendors give something, f2fs would be more feasible.
> > > > In the mean time, I strongly hope to validate and stabilize f2fs with community.
> > >
> > > Do not get me wrong, I do not think it is worth to wait for vendors
> > > to come to their senses, but it is worth constantly reminding that
> > > we *need* this kind of information and those heuristics are not
> > > feasible in the long run anyway.
> > >
> > > I believe that this conversation happened several times already, but
> > > what about having independent public database of all the internal
> > > information about hw from different vendors where users can add
> > > information gathered by the time attack heuristic so other does not
> > > have to run this again and again. I am not sure if Linaro or someone
> > > else have something like that, someone can maybe post a link to that.
> > >
> >
> > As I mentioned, I agree to push vendors to open those information all the time.
> > And, I absolutely didn't mean that it is worth to wait vendors.
> > I meant, until opening those information by vendors, something like
> > proposing f2fs or gathering heuristics are also needed simultaneously.
> >
> > Anyway, it's very interesting to build a database gathering products' information.
> > May I access the database?
>
> That's what I found:
>
> https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey
>

It is very good information when users configure f2fs according to their
storages.
Thank you.

-Jaegeuk Kim

> -Lukas
>
> >
> > Thanks,
> >
> > > Eventually we can show this to the vendors to see that their
> > > "secrets" are already public anyway and that everyones lives would be
> > > easier if they just agree to provide it from the beginning.
> > >
> > > >
> > > > > Promoting time attack heuristics instead of pushing vendors to tell
> > > > > us how their hardware should be used is a journey to hell and we've
> > > > > been talking about this for a looong time now. And I imagine that
> > > > > you especially have quite some persuasion power.
> > > >
> > > > I know. :)
> > > > If there comes a chance, I want to try.
> > > > Thanks,
> > >
> > > That's very good to hear, thank you.
> > >
> > > -Lukas

--
Jaegeuk Kim
Samsung

2012-10-09 19:53:33

by Jooyoung Hwang

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

On Tue, 2012-10-09 at 16:08 +0900, Jaegeuk Kim wrote:
> > -----Original Message-----
> > From: Vyacheslav Dubeyko [mailto:[email protected]]
> > Sent: Tuesday, October 09, 2012 4:23 AM
> > To: Jaegeuk Kim
> > Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected]; [email protected]; linux-
> > [email protected]; [email protected]; [email protected]; [email protected];
> > [email protected]
> > Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> >
> > Hi,
> >
> > On Oct 8, 2012, at 12:25 PM, Jaegeuk Kim wrote:
> >
> > >> -----Original Message-----
> > >> From: Vyacheslav Dubeyko [mailto:[email protected]]
> > >> Sent: Sunday, October 07, 2012 9:09 PM
> > >> To: Jaegeuk Kim
> > >> Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected]; [email protected]; linux-
> > >> [email protected]; [email protected]; [email protected]; [email protected];
> > >> [email protected]
> > >> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > >>
> > >> Hi,
> > >>
> > >> On Oct 7, 2012, at 1:31 PM, Jaegeuk Kim wrote:
> > >>
> > >>>> -----Original Message-----
> > >>>> From: Marco Stornelli [mailto:[email protected]]
> > >>>> Sent: Sunday, October 07, 2012 4:10 PM
> > >>>> To: Jaegeuk Kim
> > >>>> Cc: Vyacheslav Dubeyko; [email protected]; Al Viro; [email protected];
> > [email protected];
> > >>>> [email protected]; [email protected]; [email protected];
> > >> [email protected];
> > >>>> [email protected]
> > >>>> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > >>>>
> > >>>> Il 06/10/2012 22:06, Jaegeuk Kim ha scritto:
> > >>>>> 2012-10-06 (토), 17:54 +0400, Vyacheslav Dubeyko:
> > >>>>>> Hi Jaegeuk,
> > >>>>>
> > >>>>> Hi.
> > >>>>> We know each other, right? :)
> > >>>>>
> > >>>>>>
> > >>>>>>> From: 김재극 <[email protected]>
> > >>>>>>> To: [email protected], 'Theodore Ts'o' <[email protected]>,
> > >>>> [email protected], [email protected], [email protected],
> > >> [email protected],
> > >>>> [email protected], [email protected]
> > >>>>>>> Subject: [PATCH 00/16] f2fs: introduce flash-friendly file system
> > >>>>>>> Date: Fri, 05 Oct 2012 20:55:07 +0900
> > >>>>>>>
> > >>>>>>> This is a new patch set for the f2fs file system.
> > >>>>>>>
> > >>>>>>> What is F2FS?
> > >>>>>>> =============
> > >>>>>>>
> > >>>>>>> NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
> > >>>>>>> been widely being used for ranging from mobile to server systems. Since they are
> > >>>>>>> known to have different characteristics from the conventional rotational disks,
> > >>>>>>> a file system, an upper layer to the storage device, should adapt to the changes
> > >>>>>>> from the sketch.
> > >>>>>>>
> > >>>>>>> F2FS is a new file system carefully designed for the NAND flash memory-based storage
> > >>>>>>> devices. We chose a log structure file system approach, but we tried to adapt it
> > >>>>>>> to the new form of storage. Also we remedy some known issues of the very old log
> > >>>>>>> structured file system, such as snowball effect of wandering tree and high cleaning
> > >>>>>>> overhead.
> > >>>>>>>
> > >>>>>>> Because a NAND-based storage device shows different characteristics according to
> > >>>>>>> its internal geometry or flash memory management scheme aka FTL, we add various
> > >>>>>>> parameters not only for configuring on-disk layout, but also for selecting allocation
> > >>>>>>> and cleaning algorithms.
> > >>>>>>>
> > >>>>>>
> > >>>>>> What about F2FS performance? Could you share benchmarking results of the new file system?
> > >>>>>>
> > >>>>>> It is very interesting the case of aged file system. How is GC's implementation efficient?
> > Could
> > >>>> you share benchmarking results for the very aged file system state?
> > >>>>>>
> > >>>>>
> > >>>>> Although I have benchmark results, currently I'd like to see the results
> > >>>>> measured by community as a black-box. As you know, the results are very
> > >>>>> dependent on the workloads and parameters, so I think it would be better
> > >>>>> to see other results for a while.
> > >>>>> Thanks,
> > >>>>>
> > >>>>
> > >>>> 1) Actually it's a strange approach. If you have got any results you
> > >>>> should share them with the community explaining how (the workload, hw
> > >>>> and so on) your benchmark works and the specific condition. I really
> > >>>> don't like the approach "I've got the results but I don't say anything,
> > >>>> if you want a number, do it yourself".
> > >>>
> > >>> It's definitely right, and I meant *for a while*.
> > >>> I just wanted to avoid arguing with how to age file system in this time.
> > >>> Before then, I share the primitive results as follows.
> > >>>
> > >>> 1. iozone in Panda board
> > >>> - ARM A9
> > >>> - DRAM : 1GB
> > >>> - Kernel: Linux 3.3
> > >>> - Partition: 12GB (64GB Samsung eMMC)
> > >>> - Tested on 2GB file
> > >>>
> > >>> seq. read, seq. write, rand. read, rand. write
> > >>> - ext4: 30.753 17.066 5.06 4.15
> > >>> - f2fs: 30.71 16.906 5.073 15.204
> > >>>
> > >>> 2. iozone in Galaxy Nexus
> > >>> - DRAM : 1GB
> > >>> - Android 4.0.4_r1.2
> > >>> - Kernel omap 3.0.8
> > >>> - Partition: /data, 12GB
> > >>> - Tested on 2GB file
> > >>>
> > >>> seq. read, seq. write, rand. read, rand. write
> > >>> - ext4: 29.88 12.83 11.43 0.56
> > >>> - f2fs: 29.70 13.34 10.79 12.82
> > >>>
> > >>
> > >>
> > >> This is results for non-aged filesystem state. Am I correct?
> > >>
> > >
> > > Yes, right.
> > >
> > >>
> > >>> Due to the company secret, I expect to show other results after presenting f2fs at korea linux
> > forum.
> > >>>
> > >>>> 2) For a new filesystem you should send the patches to linux-fsdevel.
> > >>>
> > >>> Yes, that was totally my mistake.
> > >>>
> > >>>> 3) It's not clear the pros/cons of your filesystem, can you share with
> > >>>> us the main differences with the current fs already in mainline? Or is
> > >>>> it a company secret?
> > >>>
> > >>> After forum, I can share the slides, and I hope they will be useful to you.
> > >>>
> > >>> Instead, let me summarize at a glance compared with other file systems.
> > >>> Here are several log-structured file systems.
> > >>> Note that, F2FS operates on top of block device with consideration on the FTL behavior.
> > >>> So, JFFS2, YAFFS2, and UBIFS are out-of scope, since they are designed for raw NAND flash.
> > >>> LogFS is initially designed for raw NAND flash, but expanded to block device.
> > >>> But, I don't know whether it is stable or not.
> > >>> NILFS2 is one of major log-structured file systems, which supports multiple snap-shots.
> > >>> IMO, that feature is quite promising and important to users, but it may degrade the performance.
> > >>> There is a trade-off between functionalities and performance.
> > >>> F2FS chose high performance without any further fancy functionalities.
> > >>>
> > >>
> > >> Performance is a good goal. But fault-tolerance is also very important point. Filesystems are used
> > by
> > >> users, so, it is very important to guarantee reliability of data keeping. Degradation of
> > performance
> > >> by means of snapshots is arguable point. Snapshots can solve the problem not only some
> > unpredictable
> > >> environmental issues but also user's erroneous behavior.
> > >>
> > >
> > > Yes, I agree. I concerned the multiple snapshot feature.
> > > Of course, fault-tolerance is very important, and file system should support it as you know as
> > power-off-recovery.
> > > f2fs supports the recovery mechanism by adopting checkpoint similar to snapshot.
> > > But, f2fs does not support multiple snapshots for user convenience.
> > > I just focused on the performance, and absolutely, the multiple snapshot feature is also a good
> > alternative approach.
> > > That may be a trade-off.
> >
> > So, maybe I misunderstand something, but I can't understand the difference. As I know, snapshot in
> > NILFS2 is a checkpoint converted by user in snapshot. So, NILFS2's checkpoint is a log that adds new
> > file system's state changing (user data + metadata). In other words, checkpoint is mechanism of
> > writing on volume. Moreover, NILFS2 gives flexible way of checkpoint/snapshot management.
> >
> > As you are saying, f2fs supports checkpoints also. It means for me that checkpoints are the basic
> > mechanism of writing operations on f2fs. But, about what performance gain and difference do you talk?
>
> How about the following scenario?
> 1. data "a" is newly written.
> 2. checkpoint "A" is done.
> 3. data "a" is truncated.
> 4. checkpoint "B" is done.
>
> If fs supports multiple snapshots like "A" and "B" to users, it cannot reuse the space allocated by
> data "a" after checkpoint "B" even though data "a" is safely truncated by checkpoint "B".
> This is because fs should keep data "a" to prepare a roll-back to "A".
> So, even though user sees some free space, LFS may suffer from cleaning due to the exhausted free space.
> If users want to avoid this, they have to remove snapshots by themselves. Or, maybe automatically?
>
> >
> > Moreover, user can't manage by f2fs checkpoints completely, as I can understand. It is not so clear
> > what critical points can be a starting points of recovery actions. How is it possible to define how
> > many checkpoints f2fs volume will have?
>
> IMHO, user does not need to know how many snapshots there exist and track the fs utilization all the time.
> (off list: I don't know why cleaning process should be tuned by users.)
>
> f2fs writes two checkpoints alternatively. One is for the last stable checkpoint and another is for next checkpoint.
> So, during the recovery, f2fs starts to find one of the latest stable checkpoint.
> The stable checkpoint must have whole index structures and data consistently.
> As you knew, many things can be found in the following LFS paper.
> http://www.cs.berkeley.edu/~brewer/cs262/LFS.pdf
>
>
> >
> > How many user data (metadata) can be lost in the case of sudden power off? Is it possible to estimate
> > this?
> >
>
> If user calls sync, f2fs via vfs writes all the data, and it writes a checkpoint.
> In that case, all the data are safe.
> After sync, several fsync can be triggered, and it occurs sudden power off.
> In that case, f2fs first performs roll-back to the last stable checkpoint among two, and then roll-forward to recover fsync'ed data only.
> So, f2fs recovers data triggered by sync or fsync only.
>
> > >
> > >> As I understand, it is not possible to have a perfect performance in all possible workloads. Could
> > you
> > >> point out what workloads are the best way of F2FS using?
> > >
> > > Basically I think the following workloads will be good for F2FS.
> > > - Many random writes : it's LFS nature
> > > - Small writes with frequent fsync : f2fs is optimized to reduce the fsync overhead.
> > >
> >
> > Yes, it can be so for the case of non-aged f2fs volume. But I am afraid that for the case of aged f2fs
> > volume the situation can be opposite. I think that in the case of aged state of f2fs volume the GC
> > will be under hard work in above-mentioned workloads.
>
> Yes, you're right.
> In the LFS paper above, there are two logging schemes: threaded logging and copy-and-compaction.
> In order to avoid high cleaning overhead, f2fs adopts a hybrid one which changes the allocation policy dynamically
> between two schemes.
> Threaded logging is similar to the traditional approach, resulting in random writes without cleaning operations.
> Copy-and-compaction is another name of cleaning, resulting in sequential writes with cleaning operations.
> So, f2fs adopts one of them in runtime according to the file system status.
> Through this, we could see the random write performance comparable to ext4 even in the worst case.
>
> >
> > But, as I can understand, smartphones and tablets are the most promising way of f2fs using. Because
> > f2fs designs for NAND flash memory based-storage devices. So, I think that such workloads as "many
> > random writes" or "small writes with frequent fsync" are not so frequent use-cases. Use-case of
> > creation and deletion many small files can be more frequent use-case under smartphones and tablets.
> > But, as I can understand, f2fs has slightly expensive metadata payload in the case of small files
> > creation. Moreover, frequent and random deletion of small files ends in the very sophisticated and
> > unpredictable GC behavior, as I can understand.
> >
>
> I'd like to share the following paper.
> http://research.cs.wisc.edu/adsl/Publications/ibench-tocs12.pdf
>
> In our experiments *also* on android phones, we've seen many random patterns with frequent fsync calls.
> We found that the main problem is database, and I think f2fs is beneficial to this.
> As you mentioned, I agree that it is important to handle many small files too.
> It is right that this may cause additional cleaning overhead, and f2fs has some metadata payload overhead.
> In order to reduce the cleaning overhead, f2fs adopts static and dynamic hot and cold data separation.
> The main goal is to split the data according to their type (e.g., dir inode, file inode, dentry data, etc) as much as possible.
> Please see the document in detail.
> I think this approach is quite effective to achieve the goal.
> BTW, the payload overhead can be resolved by adopting embedding data in the inode likewise ext4.
> I think it is also good idea, and I hope to adopt it in future.
>

I'd like you to refer to the following link as well which is about
mobile workload pattern.
http://www.cs.cmu.edu/~fuyaoz/courses/15712/report.pdf
It's reported that in Android there are frequent issues of fsync and
most of them are only for small size of data.

To provide efficient fsync, F2FS minimizes the amount of metadata
written to serve a fsync. Fsync in F2FS is completed by writing user
data blocks and direct node blocks which point to them rather than
creating a new checkpoint which would incur more I/O loads.
If sudden power failure happens, then F2FS recovery routine rolls back
to the latest checkpoint and thereafter recovers file system state to
reflect all the completed fsync operations, which we call roll-forward
recovery.
You may want to look at the code about the roll-forward in recover_fsync_data().

> > >>
> > >>> Maybe or obviously it is possible to optimize ext4 or btrfs to flash storages.
> > >>> IMHO, however, they are originally designed for HDDs, so that it may or may not suffer from
> > >> fundamental designs.
> > >>> I don't know, but why not designing a new file system for flash storages as a counterpart?
> > >>>
> > >>
> > >> Yes, it is possible. But F2FS is not flash oriented filesystem as JFFS2, YAFFS2, UBIFS but block-
> > >> oriented filesystem. So, F2FS design is restricted by block-layer's opportunities in the using of
> > >> flash storages' peculiarities. Could you point out key points of F2FS design that makes this design
> > >> fundamentally unique?
> > >
> > > As you can see the f2fs kernel document patch, I think one of the most important features is to
> > align operating units between f2fs and ftl.
> > > Specifically, f2fs has section and zone, which are cleaning unit and basic allocation unit
> > respectively.
> > > Through these configurable units in f2fs, I think f2fs is able to reduce the unnecessary operations
> > done by FTL.
> > > And, in order to avoid changing IO patterns by the block-layer, f2fs merges itself some bios
> > likewise ext4.
> > >
> >
> > As I can understand, it is not so easy to create partition with f2fs volume which is aligned on
> > operating units (especially in the case of eMMC or SSD).
>
> Could you explain why it is not so easy?
>
> > Performance of unaligned volume can degrade
> > significantly because of FTL activity. What mechanisms has f2fs for excluding such situation and
> > achieving of the goal to reduce unnecessary FTL operations?
>
> Could you please explain your concern more exactly?
> In the kernel doc, the start address of f2fs data structure is aligned to the segment size (i.e., 2MB).
> Do you mean that or another operating units (e.g., section and zone)?
>
> Thanks,
>
> >
> > With the best regards,
> > Vyacheslav Dubeyko.
> >
> > >>
> > >> With the best regards,
> > >> Vyacheslav Dubeyko.
> > >>
> > >>
> > >>>>
> > >>>> Marco
> > >>>
> > >>> ---
> > >>> Jaegeuk Kim
> > >>> Samsung
> > >>>
> > >>> --
> > >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > >>> the body of a message to [email protected]
> > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >>> Please read the FAQ at http://www.tux.org/lkml/
> > >
> > >
> > > ---
> > > Jaegeuk Kim
> > > Samsung
> > >
>
>
> ---
> Jaegeuk Kim
> Samsung
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Jooyoung Hwang
Samsung Electronics

2012-10-09 21:20:18

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

[ Folks, can you trim your responses down to just quote the part you
are responding to? Having to repeatedly scroll through 500 lines of
irrelevant text just to find the 5 lines that is being commented on
is exceedingly painful. ]

On Tue, Oct 09, 2012 at 09:01:18PM +0900, Jaegeuk Kim wrote:
> > From: Lukáš Czerner [mailto:[email protected]]
> > > > I am sorry but this reply makes me smile. How can you design a fs
> > > > relying on time attack heuristics to figure out what the proper
> > > > layout should be ? Or even endorse such heuristics to be used in
> > > > mkfs ? What we should be focusing on is to push vendors to actually
> > > > give us such information so we can properly propagate that
> > > > throughout the kernel - that's something everyone will benefit from.
> > > > After that the optimization can be done in every file system.
> > > >
> > >
> > > Frankly speaking, I agree that it would be the right direction eventually.
> > > But, as you know, it's very difficult for all flash vendors to promote and standardize that.
> > > Because each vendors have different strategies to open their internal information and also try
> > > to protect their secrets whatever they are.
> > >
> > > IMO, we don't need to wait them now.
> > > Instead, from the start, I suggest f2fs that uses those information to the file system design.
> > > In addition, I suggest using heuristics right now as best efforts.

And in response, other people are "suggesting" that this is the
wrong approach.

> > > Maybe in future, if vendors give something, f2fs would be more feasible.
> > > In the mean time, I strongly hope to validate and stabilize f2fs with community.
> >
> > Do not get me wrong, I do not think it is worth to wait for vendors
> > to come to their senses, but it is worth constantly reminding that
> > we *need* this kind of information and those heuristics are not
> > feasible in the long run anyway.
> >
> > I believe that this conversation happened several times already, but
> > what about having independent public database of all the internal
> > information about hw from different vendors where users can add
> > information gathered by the time attack heuristic so other does not
> > have to run this again and again. I am not sure if Linaro or someone
> > else have something like that, someone can maybe post a link to that.

Linaro already have one, which is another reason why using
heuristics is the wrong approach:

https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey?action=show&redirect=WorkingGroups%2FKernelConsolidation%2FProjects%2FFlashCardSurvey

> As I mentioned, I agree to push vendors to open those information all the time.
> And, I absolutely didn't mean that it is worth to wait vendors.
> I meant, until opening those information by vendors, something like
> proposing f2fs or gathering heuristics are also needed simultaneously.
>
> Anyway, it's very interesting to build a database gathering products' information.
> May I access the database?

It's public information.

If you want to support different types of flash, then either add
your timing attack derived information on specific hardware to the
above table, or force vendors to update it themselves if they want
their flash memory supported by this filesystem.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2012-10-10 02:32:57

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Dave Chinner
> Sent: Wednesday, October 10, 2012 6:20 AM
> To: Jaegeuk Kim
> Cc: 'Lukáš Czerner'; 'Namjae Jeon'; 'Vyacheslav Dubeyko'; 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro';
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>
> [ Folks, can you trim your responses down to just quote the part you
> are responding to? Having to repeatedly scroll through 500 lines of
> irrelevant text just to find the 5 lines that is being commented on
> is exceedingly painful. ]

Ok, I'll keep in mind.
Thanks.

>
> On Tue, Oct 09, 2012 at 09:01:18PM +0900, Jaegeuk Kim wrote:
> > > From: Lukáš Czerner [mailto:[email protected]]
> > > > > I am sorry but this reply makes me smile. How can you design a fs
> > > > > relying on time attack heuristics to figure out what the proper
> > > > > layout should be ? Or even endorse such heuristics to be used in
> > > > > mkfs ? What we should be focusing on is to push vendors to actually
> > > > > give us such information so we can properly propagate that
> > > > > throughout the kernel - that's something everyone will benefit from.
> > > > > After that the optimization can be done in every file system.
> > > > >
> > > >
> > > > Frankly speaking, I agree that it would be the right direction eventually.
> > > > But, as you know, it's very difficult for all flash vendors to promote and standardize that.
> > > > Because each vendors have different strategies to open their internal information and also try
> > > > to protect their secrets whatever they are.
> > > >
> > > > IMO, we don't need to wait them now.
> > > > Instead, from the start, I suggest f2fs that uses those information to the file system design.
> > > > In addition, I suggest using heuristics right now as best efforts.
>
> And in response, other people are "suggesting" that this is the
> wrong approach.

Ok, it makes sense.
I agree that the Linaro survey has been well proceeded, and no more heuristic is needed.

>
> > > > Maybe in future, if vendors give something, f2fs would be more feasible.
> > > > In the mean time, I strongly hope to validate and stabilize f2fs with community.
> > >
> > > Do not get me wrong, I do not think it is worth to wait for vendors
> > > to come to their senses, but it is worth constantly reminding that
> > > we *need* this kind of information and those heuristics are not
> > > feasible in the long run anyway.
> > >
> > > I believe that this conversation happened several times already, but
> > > what about having independent public database of all the internal
> > > information about hw from different vendors where users can add
> > > information gathered by the time attack heuristic so other does not
> > > have to run this again and again. I am not sure if Linaro or someone
> > > else have something like that, someone can maybe post a link to that.
>
> Linaro already have one, which is another reason why using
> heuristics is the wrong approach:
>
> https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey?action=show&redirect=WorkingGrou
> ps%2FKernelConsolidation%2FProjects%2FFlashCardSurvey
>
> > As I mentioned, I agree to push vendors to open those information all the time.
> > And, I absolutely didn't mean that it is worth to wait vendors.
> > I meant, until opening those information by vendors, something like
> > proposing f2fs or gathering heuristics are also needed simultaneously.
> >
> > Anyway, it's very interesting to build a database gathering products' information.
> > May I access the database?
>
> It's public information.
>
> If you want to support different types of flash, then either add
> your timing attack derived information on specific hardware to the
> above table, or force vendors to update it themselves if they want
> their flash memory supported by this filesystem.

Sound good.
If I also get something, I'll try.
Thank you.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-10-10 07:58:04

by Viacheslav Dubeyko

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

On Tue, 2012-10-09 at 16:08 +0900, Jaegeuk Kim wrote:
> > -----Original Message-----
> > From: Vyacheslav Dubeyko [mailto:[email protected]]
> > Sent: Tuesday, October 09, 2012 4:23 AM
> > To: Jaegeuk Kim
> > Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected]; [email protected]; linux-
> > [email protected]; [email protected]; [email protected]; [email protected];
> > [email protected]
> > Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

> > >>> NILFS2 is one of major log-structured file systems, which supports multiple snap-shots.
> > >>> IMO, that feature is quite promising and important to users, but it may degrade the performance.
> > >>> There is a trade-off between functionalities and performance.
> > >>> F2FS chose high performance without any further fancy functionalities.
> > >>>
> > >>
> > >> Performance is a good goal. But fault-tolerance is also very important point. Filesystems are used
> > by
> > >> users, so, it is very important to guarantee reliability of data keeping. Degradation of
> > performance
> > >> by means of snapshots is arguable point. Snapshots can solve the problem not only some
> > unpredictable
> > >> environmental issues but also user's erroneous behavior.
> > >>
> > >
> > > Yes, I agree. I concerned the multiple snapshot feature.
> > > Of course, fault-tolerance is very important, and file system should support it as you know as
> > power-off-recovery.
> > > f2fs supports the recovery mechanism by adopting checkpoint similar to snapshot.
> > > But, f2fs does not support multiple snapshots for user convenience.
> > > I just focused on the performance, and absolutely, the multiple snapshot feature is also a good
> > alternative approach.
> > > That may be a trade-off.
> >
> > So, maybe I misunderstand something, but I can't understand the difference. As I know, snapshot in
> > NILFS2 is a checkpoint converted by user in snapshot. So, NILFS2's checkpoint is a log that adds new
> > file system's state changing (user data + metadata). In other words, checkpoint is mechanism of
> > writing on volume. Moreover, NILFS2 gives flexible way of checkpoint/snapshot management.
> >
> > As you are saying, f2fs supports checkpoints also. It means for me that checkpoints are the basic
> > mechanism of writing operations on f2fs. But, about what performance gain and difference do you talk?
>
> How about the following scenario?
> 1. data "a" is newly written.
> 2. checkpoint "A" is done.
> 3. data "a" is truncated.
> 4. checkpoint "B" is done.
>
> If fs supports multiple snapshots like "A" and "B" to users, it cannot reuse the space allocated by
> data "a" after checkpoint "B" even though data "a" is safely truncated by checkpoint "B".
> This is because fs should keep data "a" to prepare a roll-back to "A".
> So, even though user sees some free space, LFS may suffer from cleaning due to the exhausted free space.
> If users want to avoid this, they have to remove snapshots by themselves. Or, maybe automatically?
>

I feel that here it exists some misunderstanding in checkpoint/snapshot terminology (especially, for the NILFS2 case). It is possible that NILFS2 volume can contain only checkpoints (if user doesn't created any snapshot). You are right, snapshot cannot be deleted because, in other word, user marked this file system state as important point. But checkpoints can be reclaimed easily. I can't see any problem to reclaim free space from checkpoints in above-mentioned scenario in the case of NILFS2. But if a user decides to make a snapshot then it is a law.

So, from my point of view, f2fs volume contains only checkpoints without possibility freeze some of it as snapshot. The f2fs volume contains checkpoints also but user can't touch it in some way.

As I know, NILFS2 has Garbage Collector that removes checkpoints automatically in background. But it is possible also to force removing as checkpoints as snapshots by hands with special utility using. As I can understand, f2fs has Garbage Collector also that reclaims free space of dirty checkpoints. So, what is the difference? I have such opinion that difference is in lack of easy manipulation by checkpoints in the case of f2fs.

> >
> > Moreover, user can't manage by f2fs checkpoints completely, as I can understand. It is not so clear
> > what critical points can be a starting points of recovery actions. How is it possible to define how
> > many checkpoints f2fs volume will have?
>
> IMHO, user does not need to know how many snapshots there exist and track the fs utilization all the time.
> (off list: I don't know why cleaning process should be tuned by users.)
>

What do you plan to do in the case of users' complains about issues with free space reclaiming? If user doesn't know about checkpoints and haven't any tools for accessing to checkpoints then how is it possible to investigate issues with free space reclaiming on an user side?

> f2fs writes two checkpoints alternatively. One is for the last stable checkpoint and another is for next checkpoint.
> So, during the recovery, f2fs starts to find one of the latest stable checkpoint.
> The stable checkpoint must have whole index structures and data consistently.
> As you knew, many things can be found in the following LFS paper.
> http://www.cs.berkeley.edu/~brewer/cs262/LFS.pdf
>
>
> >
> > How many user data (metadata) can be lost in the case of sudden power off? Is it possible to estimate
> > this?
> >
>
> If user calls sync, f2fs via vfs writes all the data, and it writes a checkpoint.
> In that case, all the data are safe.
> After sync, several fsync can be triggered, and it occurs sudden power off.
> In that case, f2fs first performs roll-back to the last stable checkpoint among two, and then roll-forward to recover fsync'ed data only.
> So, f2fs recovers data triggered by sync or fsync only.
>

So, as I can understand, f2fs can be recovered by driver in the case of validity of one from two checkpoints. Sudden power-off can occur anytime. How high probability to achieve unrecoverable by driver state of f2fs during sudden power-off? Is it possible to recover f2fs in such case by fsck, for example?

> > >
> > >> As I understand, it is not possible to have a perfect performance in all possible workloads. Could
> > you
> > >> point out what workloads are the best way of F2FS using?
> > >
> > > Basically I think the following workloads will be good for F2FS.
> > > - Many random writes : it's LFS nature
> > > - Small writes with frequent fsync : f2fs is optimized to reduce the fsync overhead.
> > >
> >
> > Yes, it can be so for the case of non-aged f2fs volume. But I am afraid that for the case of aged f2fs
> > volume the situation can be opposite. I think that in the case of aged state of f2fs volume the GC
> > will be under hard work in above-mentioned workloads.
>
> Yes, you're right.
> In the LFS paper above, there are two logging schemes: threaded logging and copy-and-compaction.
> In order to avoid high cleaning overhead, f2fs adopts a hybrid one which changes the allocation policy dynamically
> between two schemes.
> Threaded logging is similar to the traditional approach, resulting in random writes without cleaning operations.
> Copy-and-compaction is another name of cleaning, resulting in sequential writes with cleaning operations.
> So, f2fs adopts one of them in runtime according to the file system status.
> Through this, we could see the random write performance comparable to ext4 even in the worst case.
>

As I can understand, the goal of f2fs is to be a flash-friendly file system by means of reducing unnecessary FTL operations. This goal is achieving by means of alignment on operation unit and copy-on-write policy, from my understanding. So, I think that write operations without cleaning can be resulted in additional FTL operations.

> >
> > But, as I can understand, smartphones and tablets are the most promising way of f2fs using. Because
> > f2fs designs for NAND flash memory based-storage devices. So, I think that such workloads as "many
> > random writes" or "small writes with frequent fsync" are not so frequent use-cases. Use-case of
> > creation and deletion many small files can be more frequent use-case under smartphones and tablets.
> > But, as I can understand, f2fs has slightly expensive metadata payload in the case of small files
> > creation. Moreover, frequent and random deletion of small files ends in the very sophisticated and
> > unpredictable GC behavior, as I can understand.
> >
>
> I'd like to share the following paper.
> http://research.cs.wisc.edu/adsl/Publications/ibench-tocs12.pdf
>

Excellent paper. Thank you.

> In our experiments *also* on android phones, we've seen many random patterns with frequent fsync calls.
> We found that the main problem is database, and I think f2fs is beneficial to this.

I think that database is not main use-case on Android phones. The dominating use-case can be operation by multimedia information and operations with small files, from my point of view.

So, it is possible to extract such key points from the shared paper: (1) file has complex structure; (2) sequential access is not sequential; (3) auxiliary files dominate; (4) multiple threads perform I/O.

I am afraid that random modification of different part of files and I/O operations from multiple threads can lead to significant fragmentation as file fragments as directory meta-information because of garbage collection.

I think that Iozone can be not fully proper benchmarking suite for file system performance estimation in such case. Maybe it needs to use special synthetic benchmarking tool.

> As you mentioned, I agree that it is important to handle many small files too.
> It is right that this may cause additional cleaning overhead, and f2fs has some metadata payload overhead.
> In order to reduce the cleaning overhead, f2fs adopts static and dynamic hot and cold data separation.
> The main goal is to split the data according to their type (e.g., dir inode, file inode, dentry data, etc) as much as possible.
> Please see the document in detail.
> I think this approach is quite effective to achieve the goal.
> BTW, the payload overhead can be resolved by adopting embedding data in the inode likewise ext4.
> I think it is also good idea, and I hope to adopt it in future.
>

As I can understand, f2fs uses old-fashioned (ext2/ext3 likewise) block-mapping scheme. This approach have significant metadata and performance payload. Extent approach can be more promising approach. But I am afraid that extent approach contradicts to f2fs internal techniques (Garbage Collector technique). So, it will be very hard to adopt extent approach in f2fs, from my point of view.


> > >
> > > As you can see the f2fs kernel document patch, I think one of the most important features is to
> > align operating units between f2fs and ftl.
> > > Specifically, f2fs has section and zone, which are cleaning unit and basic allocation unit
> > respectively.
> > > Through these configurable units in f2fs, I think f2fs is able to reduce the unnecessary operations
> > done by FTL.
> > > And, in order to avoid changing IO patterns by the block-layer, f2fs merges itself some bios
> > likewise ext4.
> > >
> >
> > As I can understand, it is not so easy to create partition with f2fs volume which is aligned on
> > operating units (especially in the case of eMMC or SSD).
>
> Could you explain why it is not so easy?
>
> > Performance of unaligned volume can degrade
> > significantly because of FTL activity. What mechanisms has f2fs for excluding such situation and
> > achieving of the goal to reduce unnecessary FTL operations?
>
> Could you please explain your concern more exactly?
> In the kernel doc, the start address of f2fs data structure is aligned to the segment size (i.e., 2MB).
> Do you mean that or another operating units (e.g., section and zone)?
>

I mean that every volume is placed inside any partition (MTD or GPT). Every partition begins from any physical sector. So, as I can understand, f2fs volume can begin from physical sector that is laid inside physical erase block. Thereby, in such case of formating the f2fs's operation units will be unaligned in relation of physical erase blocks, from my point of view. Maybe, I misunderstand something but it can lead to additional FTL operations and performance degradation, from my point of view.

With the best regards,
Vyacheslav Dubeyko.

> Thanks,
>
> >
> > With the best regards,
> > Vyacheslav Dubeyko.
> >
> > >>
> > >> With the best regards,
> > >> Vyacheslav Dubeyko.
> > >>
> > >>
> > >>>>
> > >>>> Marco
> > >>>
> > >>> ---
> > >>> Jaegeuk Kim
> > >>> Samsung
> > >>>
> > >>> --
> > >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > >>> the body of a message to [email protected]
> > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >>> Please read the FAQ at http://www.tux.org/lkml/
> > >
> > >
> > > ---
> > > Jaegeuk Kim
> > > Samsung
> > >
>
>
> ---
> Jaegeuk Kim
> Samsung
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2012-10-10 08:05:49

by Viacheslav Dubeyko

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

On Tue, 2012-10-09 at 14:53 -0500, Jooyoung Hwang wrote:
> On Tue, 2012-10-09 at 16:08 +0900, Jaegeuk Kim wrote:
> > > -----Original Message-----
> > > From: Vyacheslav Dubeyko [mailto:[email protected]]
> > > Sent: Tuesday, October 09, 2012 4:23 AM
> > > To: Jaegeuk Kim
> > > Cc: 'Marco Stornelli'; 'Jaegeuk Kim'; 'Al Viro'; [email protected]; [email protected]; linux-
> > > [email protected]; [email protected]; [email protected]; [email protected];
> > > [email protected]
> > > Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

>
> I'd like you to refer to the following link as well which is about
> mobile workload pattern.
> http://www.cs.cmu.edu/~fuyaoz/courses/15712/report.pdf
> It's reported that in Android there are frequent issues of fsync and
> most of them are only for small size of data.
>
> To provide efficient fsync, F2FS minimizes the amount of metadata
> written to serve a fsync. Fsync in F2FS is completed by writing user
> data blocks and direct node blocks which point to them rather than
> creating a new checkpoint which would incur more I/O loads.
> If sudden power failure happens, then F2FS recovery routine rolls back
> to the latest checkpoint and thereafter recovers file system state to
> reflect all the completed fsync operations, which we call roll-forward
> recovery.
> You may want to look at the code about the roll-forward in recover_fsync_data().
>

Thank you.

With the best regards,
Vyacheslav Dubeyko.

> --
> Jooyoung Hwang
> Samsung Electronics
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2012-10-10 08:16:14

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

On Tue, Oct 09, 2012 at 01:01:24PM +0200, Lukáš Czerner wrote:
> Do not get me wrong, I do not think it is worth to wait for vendors
> to come to their senses, but it is worth constantly reminding that
> we *need* this kind of information and those heuristics are not
> feasible in the long run anyway.

A number of us has been telling flash vendors exactly this. The
technical people do seem to understand. It's management who seem to
be primarily clueless, even though this information can be extracted
by employing timing attacks on the media. I've pointed this out
before, and the technical people agree that trying to keep this
information as a "trade secret" is pointless, stupid, and
counterproductive. Trying to get the pointy-haired bosses to
understand may take quite a while.

That being said, in many cases, it doesn't really matter. For
example, if a manufacturer has a production run of a million Android
mobile devices, (a) all of the eMMC devices will be the same (or at
least come from a handful of suppliers in the worst case), and (b) the
menufacturers *will* be able to get this information under NDA, and so
they can just feed it straight to the mkfs program. There's no need
in many cases to have mkfs burn write cycles carrying out a timing
attack on which flash device that it is formatting.


My concern is a different one. We shouldn't just be focusing on
sqlite performance assuming that its characteristics are fixed, to the
point where it drives file system design and benchmarking. Currently
sqllite does a lot of pointless writes at every single transaction
boundary which could be optimized if you relax the design constraint
that the database has to be in a single file --- something which is a
nice-to-have for some applications, but which really doesn't matter in
an embedded/mobile handset use case.

It may very well be that f2fs is still going to be better since it is
trying to minimize the number of erase blocks that are "open" for
writing at one time. And even if eMMC devices become more
intelligent, optimizing for erase blocks is still a good thing
(although it may not result in as spectacular wins on flash devices
with more sophisticated FTL's.).

However, it may also be that we'll be able to teach some existing file
systme how to be more intelligent about optimizing for erase blocks
that could be made production stable faster. (I have some ideas of
how to do this for ext4.)

But the point I'm trying to drive home here is that we shouldn't
assume that the only thing we can do is do optimize the file system.
Given the amount of time it takes to test, performance tune, and
confidence that the file system is sound and stable (look at how long
btrfs has taken to mature), it is likely that both flash technology
and workload characteristics will change before f2fs is fully mature
--- and this is no slight on the good work Jaegeuk and his team have
done.

Long experience with file systems show us that they are like fine
wine; they take time to mature. Whether you're talking about
ext2/3/4, btrfs, Sun's ZFS, Digital's ADVFS, IBM's JFS or GPFS etc.,
and whether you're talking about file systems developed using open
source or more traditional corporate development processes, it takes a
minimum of 3-5 years and 50-200 PY's of effort to create a fully
production-ready file system from scratch (and some of the people
which I surveyed for the Nxxt Generation File System task force, some
of which had decades of experience creating and working with file
systems, thought the 50-75 Person-Year estimate was a lowball --- note
that Sun's ZFS took *seven* years to develop, even with a generously
staffed team.)

As an open source example, the NGFS system task force, decided to
claim, in its November 2007 report-out, that btrfs would be ready for
community distro's in two years, since otherwise the managers and
other folks who control corporate budgets at the companies involved
would be scared off and decide not to fund the project. And yet here
we are in 2012, five years later, and we're just starting to see btrfs
support show up in community distro's as a supported option, and I
don't think most people would claim it is ready for production use in
enterprise distro's yet.

Given that, we might as well make sure we can do what we can to
optimize performance up and down the storage stack --- not just at the
file system level, but also by optimizing sqlite for embedded/handset
use cases.

Regards,

- Ted

2012-10-10 09:02:40

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

On Tue, Oct 09, 2012 at 02:53:26PM -0500, Jooyoung Hwang wrote:

> I'd like you to refer to the following link as well which is about
> mobile workload pattern.
> http://www.cs.cmu.edu/~fuyaoz/courses/15712/report.pdf
> It's reported that in Android there are frequent issues of fsync and
> most of them are only for small size of data.

What bothers me is no one is asking the question, *why* is Android
(and more specifically SQLite and the applications which call SQLite)
using fsync's so often? These aren't transaction processing systems,
after all. So there are two questions that are worth asking here.
(a) Is SQLite being as flash-friendly as possible, and (b) do the
applications really need as many transaction boundaries as they are
requesting of SQLite.

Yes, we can optimize the file system, but sometimes the best way to
optimize a write is to not to do the write at all (if it is not
required for the application's functionality, of course). If the
application is requesting 4 transaction boundaries where only one is
required, we can try to make fsync's more efficient, yes --- but there
is only so much that can be done at the fs layer.

- Ted

2012-10-10 09:44:13

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

[snip]
> > How about the following scenario?
> > 1. data "a" is newly written.
> > 2. checkpoint "A" is done.
> > 3. data "a" is truncated.
> > 4. checkpoint "B" is done.
> >
> > If fs supports multiple snapshots like "A" and "B" to users, it cannot reuse the space allocated by
> > data "a" after checkpoint "B" even though data "a" is safely truncated by checkpoint "B".
> > This is because fs should keep data "a" to prepare a roll-back to "A".
> > So, even though user sees some free space, LFS may suffer from cleaning due to the exhausted free
> space.
> > If users want to avoid this, they have to remove snapshots by themselves. Or, maybe automatically?
> >
>
> I feel that here it exists some misunderstanding in checkpoint/snapshot terminology (especially, for
> the NILFS2 case). It is possible that NILFS2 volume can contain only checkpoints (if user doesn't
> created any snapshot). You are right, snapshot cannot be deleted because, in other word, user marked
> this file system state as important point. But checkpoints can be reclaimed easily. I can't see any
> problem to reclaim free space from checkpoints in above-mentioned scenario in the case of NILFS2. But

I meant that snapshot does checkpoint.
And, the problem is related to real file system utilization managed by NILFS2.
[fs utilization to users] [fs utilization managed by NILFS2]
X - 1 X - 1
1. new data "a" X X
2. snapshot "A" X X
3. truncate "a" X - 1 X
4. snapshot "B" X - 1 X

After this, user can see X-1, but the performance will be affected by X.
Until the snapshot "A" is removed, user will experience the performance determined by X.
Do I misunderstand?

> if a user decides to make a snapshot then it is a law.
>

I don't believe users can do all the things perfectly.

> So, from my point of view, f2fs volume contains only checkpoints without possibility freeze some of it
> as snapshot. The f2fs volume contains checkpoints also but user can't touch it in some way.
>

Right.

> As I know, NILFS2 has Garbage Collector that removes checkpoints automatically in background. But it
> is possible also to force removing as checkpoints as snapshots by hands with special utility using. As

If users may not want to remove the snapshots automatically, should they configure not to do this too?

> I can understand, f2fs has Garbage Collector also that reclaims free space of dirty checkpoints. So,
> what is the difference? I have such opinion that difference is in lack of easy manipulation by
> checkpoints in the case of f2fs.

The problem that I concerned was performance degradation due to the real utilization available to the file system.

>
> > >
> > > Moreover, user can't manage by f2fs checkpoints completely, as I can understand. It is not so
> clear
> > > what critical points can be a starting points of recovery actions. How is it possible to define
> how
> > > many checkpoints f2fs volume will have?
> >
> > IMHO, user does not need to know how many snapshots there exist and track the fs utilization all the
> time.
> > (off list: I don't know why cleaning process should be tuned by users.)
> >
>
> What do you plan to do in the case of users' complains about issues with free space reclaiming? If
> user doesn't know about checkpoints and haven't any tools for accessing to checkpoints then how is it
> possible to investigate issues with free space reclaiming on an user side?

Could you explain why reclaiming free space is an issue?
IMHO, that issue is caused by adopting multiple snapshots.

[snip]

>
> So, as I can understand, f2fs can be recovered by driver in the case of validity of one from two
> checkpoints. Sudden power-off can occur anytime. How high probability to achieve unrecoverable by
> driver state of f2fs during sudden power-off? Is it possible to recover f2fs in such case by fsck, for

In order to avoid that case, f2fs minimizes data writes and carefully overwrites some of them during roll-forward.

> example?
>
> > > >
> > > >> As I understand, it is not possible to have a perfect performance in all possible workloads.
> Could
> > > you
> > > >> point out what workloads are the best way of F2FS using?
> > > >
> > > > Basically I think the following workloads will be good for F2FS.
> > > > - Many random writes : it's LFS nature
> > > > - Small writes with frequent fsync : f2fs is optimized to reduce the fsync overhead.
> > > >
> > >
> > > Yes, it can be so for the case of non-aged f2fs volume. But I am afraid that for the case of aged
> f2fs
> > > volume the situation can be opposite. I think that in the case of aged state of f2fs volume the GC
> > > will be under hard work in above-mentioned workloads.
> >
> > Yes, you're right.
> > In the LFS paper above, there are two logging schemes: threaded logging and copy-and-compaction.
> > In order to avoid high cleaning overhead, f2fs adopts a hybrid one which changes the allocation
> policy dynamically
> > between two schemes.
> > Threaded logging is similar to the traditional approach, resulting in random writes without cleaning
> operations.
> > Copy-and-compaction is another name of cleaning, resulting in sequential writes with cleaning
> operations.
> > So, f2fs adopts one of them in runtime according to the file system status.
> > Through this, we could see the random write performance comparable to ext4 even in the worst case.
> >
>
> As I can understand, the goal of f2fs is to be a flash-friendly file system by means of reducing
> unnecessary FTL operations. This goal is achieving by means of alignment on operation unit and copy-
> on-write policy, from my understanding. So, I think that write operations without cleaning can be
> resulted in additional FTL operations.

Yes, but try to minimize them.

[snip]

> > In our experiments *also* on android phones, we've seen many random patterns with frequent fsync
> calls.
> > We found that the main problem is database, and I think f2fs is beneficial to this.
>
> I think that database is not main use-case on Android phones. The dominating use-case can be operation
> by multimedia information and operations with small files, from my point of view.
>
> So, it is possible to extract such key points from the shared paper: (1) file has complex structure;
> (2) sequential access is not sequential; (3) auxiliary files dominate; (4) multiple threads perform
> I/O.
>
> I am afraid that random modification of different part of files and I/O operations from multiple
> threads can lead to significant fragmentation as file fragments as directory meta-information because
> of garbage collection.

Could you explain in more detail?

>
> I think that Iozone can be not fully proper benchmarking suite for file system performance estimation
> in such case. Maybe it needs to use special synthetic benchmarking tool.
>

Yes, it needs.

> > As you mentioned, I agree that it is important to handle many small files too.
> > It is right that this may cause additional cleaning overhead, and f2fs has some metadata payload
> overhead.
> > In order to reduce the cleaning overhead, f2fs adopts static and dynamic hot and cold data
> separation.
> > The main goal is to split the data according to their type (e.g., dir inode, file inode, dentry data,
> etc) as much as possible.
> > Please see the document in detail.
> > I think this approach is quite effective to achieve the goal.
> > BTW, the payload overhead can be resolved by adopting embedding data in the inode likewise ext4.
> > I think it is also good idea, and I hope to adopt it in future.
> >
>
> As I can understand, f2fs uses old-fashioned (ext2/ext3 likewise) block-mapping scheme. This approach
> have significant metadata and performance payload. Extent approach can be more promising approach. But
> I am afraid that extent approach contradicts to f2fs internal techniques (Garbage Collector technique).
> So, it will be very hard to adopt extent approach in f2fs, from my point of view.
>

Right, so f2fs adopts an extent cache for better read performance.

>
> > > >
> > > > As you can see the f2fs kernel document patch, I think one of the most important features is to
> > > align operating units between f2fs and ftl.
> > > > Specifically, f2fs has section and zone, which are cleaning unit and basic allocation unit
> > > respectively.
> > > > Through these configurable units in f2fs, I think f2fs is able to reduce the unnecessary
> operations
> > > done by FTL.
> > > > And, in order to avoid changing IO patterns by the block-layer, f2fs merges itself some bios
> > > likewise ext4.
> > > >
> > >
> > > As I can understand, it is not so easy to create partition with f2fs volume which is aligned on
> > > operating units (especially in the case of eMMC or SSD).
> >
> > Could you explain why it is not so easy?
> >
> > > Performance of unaligned volume can degrade
> > > significantly because of FTL activity. What mechanisms has f2fs for excluding such situation and
> > > achieving of the goal to reduce unnecessary FTL operations?
> >
> > Could you please explain your concern more exactly?
> > In the kernel doc, the start address of f2fs data structure is aligned to the segment size (i.e.,
> 2MB).
> > Do you mean that or another operating units (e.g., section and zone)?
> >
>
> I mean that every volume is placed inside any partition (MTD or GPT). Every partition begins from any
> physical sector. So, as I can understand, f2fs volume can begin from physical sector that is laid
> inside physical erase block. Thereby, in such case of formating the f2fs's operation units will be
> unaligned in relation of physical erase blocks, from my point of view. Maybe, I misunderstand
> something but it can lead to additional FTL operations and performance degradation, from my point of
> view.

I think mkfs already calculates the offset to align that.

Thanks,

2012-10-10 11:53:05

by Clemens Ladisch

[permalink] [raw]
Subject: Re: SQLite on flash (was: [PATCH 00/16] f2fs: introduce flash-friendly file system)

(CC'd sqlite-users ML)
Theodore Ts'o wrote:
> On Tue, Oct 09, 2012 at 02:53:26PM -0500, Jooyoung Hwang wrote:
>> I'd like you to refer to the following link as well which is about
>> mobile workload pattern.
>> http://www.cs.cmu.edu/~fuyaoz/courses/15712/report.pdf
>> It's reported that in Android there are frequent issues of fsync and
>> most of them are only for small size of data.
>
> What bothers me is no one is asking the question, *why* is Android
> (and more specifically SQLite and the applications which call SQLite)
> using fsync's so often? These aren't transaction processing systems,
> after all.

Neither were Firefox's bookmarks and history. That one got fixed,
but it was a single application.

> So there are two questions that are worth asking here.
> (a) Is SQLite being as flash-friendly as possible,

It would be possible to use the write-ahead log instead of the default
rollback journal, but that is unfortunately not entirely compatible --
HTC once enabled WAL by default on some phones, and all apps that tried
to open a database in read-only mode broke. If apps are aware of this,
they can enable WAL for their own DBs without problems.

There are some other configuration options, but they, too, have side
effects and thus cannot be enabled by default.

SQLite 4 (currently being developed) will use a log-structured merge
database.

> and (b) do the applications really need as many transaction boundaries
> as they are requesting of SQLite.

Most apps get the default of one transaction per statement because they
do not bother to mange transactions explicitly.


Regards,
Clemens

2012-10-10 10:36:40

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

On Tue, 2012-10-09 at 10:31 +0200, Lukáš Czerner wrote:
> I am sorry but this reply makes me smile. How can you design a fs
> relying on time attack heuristics to figure out what the proper
> layout should be ? Or even endorse such heuristics to be used in
> mkfs ? What we should be focusing on is to push vendors to actually
> give us such information so we can properly propagate that
> throughout the kernel - that's something everyone will benefit from.
> After that the optimization can be done in every file system.
>
> Promoting time attack heuristics instead of pushing vendors to tell
> us how their hardware should be used is a journey to hell and we've
> been talking about this for a looong time now. And I imagine that
> you especially have quite some persuasion power.

The whole thing is silly. What we actually want on an embedded system is
to ditch the FTL altogether and have direct access to the NAND. Then we
can *know* our file system is behaving optimally. And we don't need
hacks like TRIM to try to make things a little less broken.

--
dwmw2


Attachments:
smime.p7s (6.03 kB)

2012-10-11 03:15:02

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

2012/10/10 Jaegeuk Kim <[email protected]>:

>>
>> I mean that every volume is placed inside any partition (MTD or GPT). Every partition begins from any
>> physical sector. So, as I can understand, f2fs volume can begin from physical sector that is laid
>> inside physical erase block. Thereby, in such case of formating the f2fs's operation units will be
>> unaligned in relation of physical erase blocks, from my point of view. Maybe, I misunderstand
>> something but it can lead to additional FTL operations and performance degradation, from my point of
>> view.
>
> I think mkfs already calculates the offset to align that.
I think this answer is not what he want.
If you don't use partition table such as dos partition table or gpt, I
think that it is possible to align using mkfs.
But If we should consider partition table space in storage, I don't
understand how it could be align using mkfs.

Thanks.
> Thanks,
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-10-12 12:30:16

by Viacheslav Dubeyko

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

On Wed, 2012-10-10 at 18:43 +0900, Jaegeuk Kim wrote:
> [snip]
> > > How about the following scenario?
> > > 1. data "a" is newly written.
> > > 2. checkpoint "A" is done.
> > > 3. data "a" is truncated.
> > > 4. checkpoint "B" is done.
> > >
> > > If fs supports multiple snapshots like "A" and "B" to users, it cannot reuse the space allocated by
> > > data "a" after checkpoint "B" even though data "a" is safely truncated by checkpoint "B".
> > > This is because fs should keep data "a" to prepare a roll-back to "A".
> > > So, even though user sees some free space, LFS may suffer from cleaning due to the exhausted free
> > space.
> > > If users want to avoid this, they have to remove snapshots by themselves. Or, maybe automatically?
> > >
> >
> > I feel that here it exists some misunderstanding in checkpoint/snapshot terminology (especially, for
> > the NILFS2 case). It is possible that NILFS2 volume can contain only checkpoints (if user doesn't
> > created any snapshot). You are right, snapshot cannot be deleted because, in other word, user marked
> > this file system state as important point. But checkpoints can be reclaimed easily. I can't see any
> > problem to reclaim free space from checkpoints in above-mentioned scenario in the case of NILFS2. But
>
> I meant that snapshot does checkpoint.
> And, the problem is related to real file system utilization managed by NILFS2.
> [fs utilization to users] [fs utilization managed by NILFS2]
> X - 1 X - 1
> 1. new data "a" X X
> 2. snapshot "A" X X
> 3. truncate "a" X - 1 X
> 4. snapshot "B" X - 1 X
>
> After this, user can see X-1, but the performance will be affected by X.
> Until the snapshot "A" is removed, user will experience the performance determined by X.
> Do I misunderstand?
>

Ok. Maybe I have some misunderstanding but checkpoint and snapshot are different things for me (especially, in the case of NILFS2). :-)

The most important is that f2fs has more efficient scheme of working with checkpoints, from your point of view. If you are right then it is very good. And I need to be more familiar with f2fs code.

[snip]
> > As I know, NILFS2 has Garbage Collector that removes checkpoints automatically in background. But it
> > is possible also to force removing as checkpoints as snapshots by hands with special utility using. As
>
> If users may not want to remove the snapshots automatically, should they configure not to do this too?
>

As I know, NILFS2 doesn't delete snapshots automatically but checkpoints - yes. Moreover, it exists nilfs_cleanerd.conf configuration file that makes possible to manage by NILFS cleanerd daemon's behavior (min/max number of clean segments, selection policy, check/clean intervals and so on).

[snip]
> > > IMHO, user does not need to know how many snapshots there exist and track the fs utilization all the
> > time.
> > > (off list: I don't know why cleaning process should be tuned by users.)
> > >
> >
> > What do you plan to do in the case of users' complains about issues with free space reclaiming? If
> > user doesn't know about checkpoints and haven't any tools for accessing to checkpoints then how is it
> > possible to investigate issues with free space reclaiming on an user side?
>
> Could you explain why reclaiming free space is an issue?
> IMHO, that issue is caused by adopting multiple snapshots.
>

I didn't mean that reclaiming free space is an issue. I hope that f2fs is stable but unfortunately it is not possible for any software to be completely without bugs. So, anyway, f2fs users can have some issues during using. One of the possible issue can be unexpected situation with not reclaiming of free space. So, my question was about possibility to investigate such bug on the user's side. From my point of view, NILFS2 has very good utilities for such investigation.

[snip]
> > > In our experiments *also* on android phones, we've seen many random patterns with frequent fsync
> > calls.
> > > We found that the main problem is database, and I think f2fs is beneficial to this.
> >
> > I think that database is not main use-case on Android phones. The dominating use-case can be operation
> > by multimedia information and operations with small files, from my point of view.
> >
> > So, it is possible to extract such key points from the shared paper: (1) file has complex structure;
> > (2) sequential access is not sequential; (3) auxiliary files dominate; (4) multiple threads perform
> > I/O.
> >
> > I am afraid that random modification of different part of files and I/O operations from multiple
> > threads can lead to significant fragmentation as file fragments as directory meta-information because
> > of garbage collection.
>
> Could you explain in more detail?
>

I mean that complex structure of modern files can lead to random modification of small file's parts. Moreover, such modifications can occur from multiple threads. So, it means for me that Copy-On-Write policy can lead to file's content fragmentation. Then GC can make additional fragmentation also.

But maybe I have some misunderstanding of f2fs internal techniques.

With the best regards,
Vyacheslav Dubeyko.

2012-10-12 14:25:50

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

2012-10-12 (금), 16:30 +0400, Vyacheslav Dubeyko:
> On Wed, 2012-10-10 at 18:43 +0900, Jaegeuk Kim wrote:
> > [snip]
> > > > How about the following scenario?
> > > > 1. data "a" is newly written.
> > > > 2. checkpoint "A" is done.
> > > > 3. data "a" is truncated.
> > > > 4. checkpoint "B" is done.
> > > >
> > > > If fs supports multiple snapshots like "A" and "B" to users, it cannot reuse the space allocated by
> > > > data "a" after checkpoint "B" even though data "a" is safely truncated by checkpoint "B".
> > > > This is because fs should keep data "a" to prepare a roll-back to "A".
> > > > So, even though user sees some free space, LFS may suffer from cleaning due to the exhausted free
> > > space.
> > > > If users want to avoid this, they have to remove snapshots by themselves. Or, maybe automatically?
> > > >
> > >
> > > I feel that here it exists some misunderstanding in checkpoint/snapshot terminology (especially, for
> > > the NILFS2 case). It is possible that NILFS2 volume can contain only checkpoints (if user doesn't
> > > created any snapshot). You are right, snapshot cannot be deleted because, in other word, user marked
> > > this file system state as important point. But checkpoints can be reclaimed easily. I can't see any
> > > problem to reclaim free space from checkpoints in above-mentioned scenario in the case of NILFS2. But
> >
> > I meant that snapshot does checkpoint.
> > And, the problem is related to real file system utilization managed by NILFS2.
> > [fs utilization to users] [fs utilization managed by NILFS2]
> > X - 1 X - 1
> > 1. new data "a" X X
> > 2. snapshot "A" X X
> > 3. truncate "a" X - 1 X
> > 4. snapshot "B" X - 1 X
> >
> > After this, user can see X-1, but the performance will be affected by X.
> > Until the snapshot "A" is removed, user will experience the performance determined by X.
> > Do I misunderstand?
> >
>
> Ok. Maybe I have some misunderstanding but checkpoint and snapshot are different things for me (especially, in the case of NILFS2). :-)
>
> The most important is that f2fs has more efficient scheme of working with checkpoints, from your point of view. If you are right then it is very good. And I need to be more familiar with f2fs code.
>

Ok, thanks.

> [snip]
> > > As I know, NILFS2 has Garbage Collector that removes checkpoints automatically in background. But it
> > > is possible also to force removing as checkpoints as snapshots by hands with special utility using. As
> >
> > If users may not want to remove the snapshots automatically, should they configure not to do this too?
> >
>
> As I know, NILFS2 doesn't delete snapshots automatically but checkpoints - yes. Moreover, it exists nilfs_cleanerd.conf configuration file that makes possible to manage by NILFS cleanerd daemon's behavior (min/max number of clean segments, selection policy, check/clean intervals and so on).
>

Ok.

> [snip]
> > > > IMHO, user does not need to know how many snapshots there exist and track the fs utilization all the
> > > time.
> > > > (off list: I don't know why cleaning process should be tuned by users.)
> > > >
> > >
> > > What do you plan to do in the case of users' complains about issues with free space reclaiming? If
> > > user doesn't know about checkpoints and haven't any tools for accessing to checkpoints then how is it
> > > possible to investigate issues with free space reclaiming on an user side?
> >
> > Could you explain why reclaiming free space is an issue?
> > IMHO, that issue is caused by adopting multiple snapshots.
> >
>
> I didn't mean that reclaiming free space is an issue. I hope that f2fs
> is stable but unfortunately it is not possible for any software to be
> completely without bugs. So, anyway, f2fs users can have some issues
> during using. One of the possible issue can be unexpected situation
> with not reclaiming of free space. So, my question was about
> possibility to investigate such bug on the user's side. From my point
> of view, NILFS2 has very good utilities for such investigation.

You mean fsck?
Of course, we've implemented fsck tool also.
But, why I didn't open it is that code is a mess.
Another reason is that current fsck tool only checks
the consistency of f2fs.
Now we're still working on it to open.

>
> [snip]
> > > > In our experiments *also* on android phones, we've seen many random patterns with frequent fsync
> > > calls.
> > > > We found that the main problem is database, and I think f2fs is beneficial to this.
> > >
> > > I think that database is not main use-case on Android phones. The dominating use-case can be operation
> > > by multimedia information and operations with small files, from my point of view.
> > >
> > > So, it is possible to extract such key points from the shared paper: (1) file has complex structure;
> > > (2) sequential access is not sequential; (3) auxiliary files dominate; (4) multiple threads perform
> > > I/O.
> > >
> > > I am afraid that random modification of different part of files and I/O operations from multiple
> > > threads can lead to significant fragmentation as file fragments as directory meta-information because
> > > of garbage collection.
> >
> > Could you explain in more detail?
> >
>
> I mean that complex structure of modern files can lead to random modification of small file's parts.
> Moreover, such modifications can occur from multiple threads.
> So, it means for me that Copy-On-Write policy can lead to file's content fragmentation.
> Then GC can make additional fragmentation also.
> But maybe I have some misunderstanding of f2fs internal techniques.
>

Right. Random modification may cause data fragmentation due to COW in LFS.
But, this is from the host side view only.
If we consider FTL with file system adopting the in-place-update scheme,
eventually FTL should handle the fragmentation issue instead of
file system.
So, I think fragmentation is not a particular issue in LFS only.

> With the best regards,
> Vyacheslav Dubeyko.
>
>

--
Jaegeuk Kim
Samsung

2012-10-12 20:55:12

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

On Wednesday 10 October 2012 00:53:51 Theodore Ts'o wrote:
> On Tue, Oct 09, 2012 at 01:01:24PM +0200, Lukáš Czerner wrote:
> > Do not get me wrong, I do not think it is worth to wait for vendors
> > to come to their senses, but it is worth constantly reminding that
> > we *need* this kind of information and those heuristics are not
> > feasible in the long run anyway.
>
> A number of us has been telling flash vendors exactly this. The
> technical people do seem to understand. It's management who seem to
> be primarily clueless, even though this information can be extracted
> by employing timing attacks on the media. I've pointed this out
> before, and the technical people agree that trying to keep this
> information as a "trade secret" is pointless, stupid, and
> counterproductive. Trying to get the pointy-haired bosses to
> understand may take quite a while.

For eMMC, I think we should start out defaulting to the characteristics
that are reported by the device, because they are usually correct
and those vendors for which that is not true can hopefully
come to their senses when they see how f2fs performs by default.

For USB media, the protocol does not allow you to specify the
erase block size, so we have to guess.

For SD cards, there is a field in the card's registers, but I've
never seen any value in there other than 4 MB, and in most cases
where that is not true, the standard does not allow encoding
the correct amount: it only allows power-of-two numbers up to
4 MB, and typical numbers these days are 3 MB, 6 MB or 8 MB.

Arnd

2012-10-12 20:58:11

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

On Wednesday 10 October 2012 11:36:14 David Woodhouse wrote:
> The whole thing is silly. What we actually want on an embedded system is
> to ditch the FTL altogether and have direct access to the NAND. Then we
> can know our file system is behaving optimally. And we don't need
> hacks like TRIM to try to make things a little less broken.

I think it's safe to say that the times for raw flash in consumer devices
are over, whether we like it or not. Even if we could go back to MTD
for internal storage, we'd still need something better than what we
have for removable flash storage such as USB and SD.

(and I know that xD cards are basically raw flash, but have you tried
to buy one recently?)

Arnd

2012-10-13 04:26:31

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

Is there high possibility that the storage device can be rapidly
worn-out by cleaning process ? e.g. severe fragmentation situation by
creating and removing small files.

And you told us only advantages of f2fs. Would you tell us the disadvantages ?

Thanks.

2012-10-13 12:37:34

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

2012-10-13 (토), 13:26 +0900, Namjae Jeon:
> Is there high possibility that the storage device can be rapidly
> worn-out by cleaning process ? e.g. severe fragmentation situation by
> creating and removing small files.
>

Yes, the cleaning process in F2FS induces additional writes so that
flash storage can be worn out quickly.
However, how about in traditonal file systems?
As all of us know that, FTL has an wear-leveling issue too due to the
garbage collection overhead that is fundamentally similar to the
cleaning overhead in LFS or F2FS.

So, what's the difference between them?
IMHO, the major factor to reduce the cleaning or garbage collection
overhead is how to efficiently separate hot and cold data.
So, which is a better layer between FTL and file system to achieve that?
I think the answer is the file system, since the file system has much
more information on such a hotness of all the data, but FTL doesn't know
or is hard to figure out that kind of information.

Therefore, I think the LFS approach is more beneficial to span the life
time of the storage rather than traditional one.
And, in order to do this perfectly, one thing is a criteria, the
alignment between FTL and F2FS.

> And you told us only advantages of f2fs. Would you tell us the disadvantages ?

I think there is a scenario like this.
1) One big file is created and written data sequentially.
2) Many random writes are done across the whole file range.
3) User discards cached data by doing "drop_caches" or "reboot".

At this point, I worry about the sequential read performance due to the
fragmentation.
I don't know how frequently this use-case happens, but it is one of cons
in the LFS approach.
Nevertheless, I'm thinking that the performance could be enhanced by
cooperating with a readahead mechanism in VFS.

Thanks,

>
> Thanks.

--
Jaegeuk Kim
Samsung

2012-10-17 11:12:42

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

2012/10/13, Jaegeuk Kim <[email protected]>:
> 2012-10-13 (토), 13:26 +0900, Namjae Jeon:
>> Is there high possibility that the storage device can be rapidly
>> worn-out by cleaning process ? e.g. severe fragmentation situation by
>> creating and removing small files.
>>
>
> Yes, the cleaning process in F2FS induces additional writes so that
> flash storage can be worn out quickly.
> However, how about in traditonal file systems?
> As all of us know that, FTL has an wear-leveling issue too due to the
> garbage collection overhead that is fundamentally similar to the
> cleaning overhead in LFS or F2FS.
>
> So, what's the difference between them?
> IMHO, the major factor to reduce the cleaning or garbage collection
> overhead is how to efficiently separate hot and cold data.
> So, which is a better layer between FTL and file system to achieve that?
> I think the answer is the file system, since the file system has much
> more information on such a hotness of all the data, but FTL doesn't know
> or is hard to figure out that kind of information.
>
> Therefore, I think the LFS approach is more beneficial to span the life
> time of the storage rather than traditional one.
> And, in order to do this perfectly, one thing is a criteria, the
> alignment between FTL and F2FS.

As you know, Normally users don't use one big partition on eMMC.
It means they divide several small parititions.
And F2fs will work on each small partition.
And eMMC's FTL is globally working on whole device.
I can not imagine how to work synchronously beween cleaning process of
f2fs and FTL of eMMC.

And Would you share ppt or document of f2fs if Korea Linux Forum is finished ?

Thanks.
>
>> And you told us only advantages of f2fs. Would you tell us the
>> disadvantages ?
>
> I think there is a scenario like this.
> 1) One big file is created and written data sequentially.
> 2) Many random writes are done across the whole file range.
> 3) User discards cached data by doing "drop_caches" or "reboot".
>
> At this point, I worry about the sequential read performance due to the
> fragmentation.
> I don't know how frequently this use-case happens, but it is one of cons
> in the LFS approach.
> Nevertheless, I'm thinking that the performance could be enhanced by
> cooperating with a readahead mechanism in VFS.
>
> Thanks,
>
>>
>> Thanks.
>
> --
> Jaegeuk Kim
> Samsung
>
>

2012-10-17 11:15:20

by Namjae Jeon

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

2012/10/11, Changman Lee <[email protected]>:
> 2012년 10월 11일 목요일에 Namjae Jeon<[email protected]>님이 작성:
>> 2012/10/10 Jaegeuk Kim <[email protected]>:
>>
>>>>
>>>> I mean that every volume is placed inside any partition (MTD or GPT).
> Every partition begins from any
>>>> physical sector. So, as I can understand, f2fs volume can begin from
> physical sector that is laid
>>>> inside physical erase block. Thereby, in such case of formating the
> f2fs's operation units will be
>>>> unaligned in relation of physical erase blocks, from my point of view.
> Maybe, I misunderstand
>>>> something but it can lead to additional FTL operations and performance
> degradation, from my point of
>>>> view.
>>>
>>> I think mkfs already calculates the offset to align that.
>> I think this answer is not what he want.
>> If you don't use partition table such as dos partition table or gpt, I
>> think that it is possible to align using mkfs.
>> But If we should consider partition table space in storage, I don't
>> understand how it could be align using mkfs.
>>
>> Thanks.
>
> We can know the physical starting sector address of any partitions from
> hdio geometry information got by ioctl.
If so, first block and end block of partition are useless ?

Thanks.
>
>>> Thanks,
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"
> in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
>> in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>

2012-10-17 23:06:36

by Changman Lee

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system



> -----Original Message-----
> From: Namjae Jeon [mailto:[email protected]]
> Sent: Wednesday, October 17, 2012 8:14 PM
> To: Changman Lee
> Cc: Jaegeuk Kim; Vyacheslav Dubeyko; Marco Stornelli; Jaegeuk Kim; Al Viro;
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
>
> 2012/10/11, Changman Lee <[email protected]>:
> > 2012년 10월 11일 목요일에 Namjae Jeon<[email protected]>님이 작성:
> >> 2012/10/10 Jaegeuk Kim <[email protected]>:
> >>
> >>>>
> >>>> I mean that every volume is placed inside any partition (MTD or GPT).
> > Every partition begins from any
> >>>> physical sector. So, as I can understand, f2fs volume can begin from
> > physical sector that is laid
> >>>> inside physical erase block. Thereby, in such case of formating the
> > f2fs's operation units will be
> >>>> unaligned in relation of physical erase blocks, from my point of view.
> > Maybe, I misunderstand
> >>>> something but it can lead to additional FTL operations and performance
> > degradation, from my point of
> >>>> view.
> >>>
> >>> I think mkfs already calculates the offset to align that.
> >> I think this answer is not what he want.
> >> If you don't use partition table such as dos partition table or gpt, I
> >> think that it is possible to align using mkfs.
> >> But If we should consider partition table space in storage, I don't
> >> understand how it could be align using mkfs.
> >>
> >> Thanks.
> >
> > We can know the physical starting sector address of any partitions from
> > hdio geometry information got by ioctl.
> If so, first block and end block of partition are useless ?
>
> Thanks.

For example.
If we try to align a start point of F2FS in 2MB but start sector of any partition is not aligned in 2MB,
and of course F2FS will have some unused blocks. Instead, F2FS could reduce gc cost of ftl.
I don't know my answer is what you want.

> >
> >>> Thanks,
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"
> > in
> >>> the body of a message to [email protected]
> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> >> in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at http://www.tux.org/lkml/
> >>
> >

2012-10-18 13:39:20

by Viacheslav Dubeyko

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

[snip]
> >
> > And Would you share ppt or document of f2fs if Korea Linux Forum is finished ?
> >
>
> Here I attached the slides, and LF will also share the slides.
> Thanks,
>

I had hope that slides will have more detailed description. Maybe it is
good for Linux Forum. But do you plan to publish more detailed
description of F2FS architecture, advantages/disadvantages in the form
of article? It makes sense from my point of view.

With the best regards,
Vyacheslav Dubeyko.

2012-10-18 22:14:24

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

> [snip]
> > >
> > > And Would you share ppt or document of f2fs if Korea Linux Forum is finished ?
> > >
> >
> > Here I attached the slides, and LF will also share the slides.
> > Thanks,
> >
>
> I had hope that slides will have more detailed description. Maybe it is
> good for Linux Forum. But do you plan to publish more detailed
> description of F2FS architecture, advantages/disadvantages in the form
> of article? It makes sense from my point of view.

Of course.
Jooyoung was starting to write a paper on f2fs.
I don't know when to publish, but we have a lot of works now. :)
Thanks,

>
> With the best regards,
> Vyacheslav Dubeyko.

2012-10-19 09:20:33

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

On Thu, 18 Oct 2012 17:39:11 +0400 Vyacheslav Dubeyko <[email protected]>
wrote:

> [snip]
> > >
> > > And Would you share ppt or document of f2fs if Korea Linux Forum is finished ?
> > >
> >
> > Here I attached the slides, and LF will also share the slides.
> > Thanks,
> >
>
> I had hope that slides will have more detailed description. Maybe it is
> good for Linux Forum. But do you plan to publish more detailed
> description of F2FS architecture, advantages/disadvantages in the form
> of article? It makes sense from my point of view.

<plug>
https://lwn.net/Articles/518988/
</plug>

:-)

NeilBrown

>
> With the best regards,
> Vyacheslav Dubeyko.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

!


Attachments:
signature.asc (828.00 B)