2020-12-05 05:01:42

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH RFC 0/5] Add threading support to e2fsprogs

This patch set adds the infrastructure to support threading to
libext2fs. It makes the unix_io I/O Manager thread-aware. Wang's
parallel bitmap code has been adapted to use the new threading
infrastructure.

The code has been tested with TSAN and ASAN built into gcc 10.2:

configure 'CFLAGS=-g -fsanitize=thread' 'LDFLAGS=-fsanitize=thread'
make clean ; make -j16 ; make -j16 check
configure 'CFLAGS=-g -fsanitize=address' 'LDFLAGS=-fsanitize=address'
make clean ; make -j16 ; make -j16 check

As I needed to excerpt out some of the changes to generated patches in
"Add configure and build support for the pthreads", the full patch
series can be found in git:

git fetch https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git pthreads


Theodore Ts'o (4):
Add configure and build support for the pthreads library
libext2fs: add threading support to the I/O manager abstraction
libext2fs: allow the unix_io manager's cache to be disabled and
re-enabled
Enable threaded support for e2fsprogs' applications.

Wang Shilong (1):
ext2fs: parallel bitmap loading

MCONFIG.in | 12 +-
aclocal.m4 | 486 ++++++++++++++++++++++++
configure | 814 ++++++++++++++++++++++++++++++++++++----
configure.ac | 24 ++
debugfs/debugfs.c | 6 +-
e2fsck/unix.c | 2 +-
lib/config.h.in | 351 +----------------
lib/ext2fs/ext2_io.h | 3 +
lib/ext2fs/ext2fs.h | 9 +
lib/ext2fs/openfs.c | 2 +
lib/ext2fs/rw_bitmaps.c | 323 +++++++++++++---
lib/ext2fs/test_io.c | 6 +-
lib/ext2fs/undo_io.c | 2 +
lib/ext2fs/unix_io.c | 156 +++++++-
misc/dumpe2fs.c | 2 +-
misc/e2freefrag.c | 2 +-
misc/e2fuzz.c | 4 +-
misc/e2image.c | 3 +-
misc/fuse2fs.c | 3 +-
misc/tune2fs.c | 3 +-
resize/main.c | 2 +-
21 files changed, 1719 insertions(+), 496 deletions(-)

--
2.28.0


2020-12-05 05:02:02

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH RFC 1/5] Add configure and build support for the pthreads library

Support for pthreads can be forcibly disabled by passing
"--without-pthread" to the configure script.

The actual changes in this commit are in configure.ac and MCONFIG.in;
the other files were generated as a result of running aclocal,
autoconf, and autoheader on a Debian testing system.

Note: the autoconf-archive package must now be installed before
rerunning aclocal, to supply the AX_PTHREAD macro.

Signed-off-by: Theodore Ts'o <[email protected]>
---

[ I've removed the diffs of the generated files since they would make
the diff too large for the vger mailing lists. --Ted ]

MCONFIG.in | 12 +-
aclocal.m4 | 486 +++++++++++++++++++++++++++++
configure | 814 +++++++++++++++++++++++++++++++++++++++++++-----
configure.ac | 24 ++
lib/config.h.in | 351 +--------------------
5 files changed, 1271 insertions(+), 416 deletions(-)

diff --git a/MCONFIG.in b/MCONFIG.in
index 0598f21b7..69f30674e 100644
--- a/MCONFIG.in
+++ b/MCONFIG.in
@@ -86,15 +86,17 @@ SYSTEMD_SYSTEM_UNIT_DIR = @systemd_system_unit_dir@
SANITIZER_CFLAGS = @lto_cflags@ @ubsan_cflags@ @addrsan_cflags@ @threadsan_cflags@
SANITIZER_LDFLAGS = @lto_ldflags@ @ubsan_ldflags@ @addrsan_ldflags@ @threadsan_ldflags@

-CC = @CC@
+CC = @PTHREAD_CC@
BUILD_CC = @BUILD_CC@
+PTHREAD_CFLAGS = @PTHREAD_CFLAGS@
+PTHREAD_LIBS = @PTHREAD_LIBS@
CFLAGS = @CFLAGS@
CFLAGS_SHLIB = @CFLAGS_SHLIB@
CFLAGS_STLIB = @CFLAGS_STLIB@
CPPFLAGS = @INCLUDES@
-ALL_CFLAGS = $(CPPFLAGS) $(SANITIZER_CFLAGS) $(CFLAGS) $(CFLAGS_WARN) @DEFS@ $(LOCAL_CFLAGS)
-ALL_CFLAGS_SHLIB = $(CPPFLAGS) $(SANITIZER_CFLAGS) $(CFLAGS_SHLIB) $(CFLAGS_WARN) @DEFS@ $(LOCAL_CFLAGS)
-ALL_CFLAGS_STLIB = $(CPPFLAGS) $(SANITIZER_CFLAGS) $(CFLAGS_STLIB) $(CFLAGS_WARN) @DEFS@ $(LOCAL_CFLAGS)
+ALL_CFLAGS = $(CPPFLAGS) $(SANITIZER_CFLAGS) $(CFLAGS) $(PTHREAD_CFLAGS) $(CFLAGS_WARN) @DEFS@ $(LOCAL_CFLAGS)
+ALL_CFLAGS_SHLIB = $(CPPFLAGS) $(SANITIZER_CFLAGS) $(CFLAGS_SHLIB) $(PTHREAD_CFLAGS) $(CFLAGS_WARN) @DEFS@ $(LOCAL_CFLAGS)
+ALL_CFLAGS_STLIB = $(CPPFLAGS) $(SANITIZER_CFLAGS) $(CFLAGS_STLIB) $(PTHREAD_CFLAGS) $(CFLAGS_WARN) @DEFS@ $(LOCAL_CFLAGS)
LDFLAGS = $(SANITIZER_LDFLAGS) @LDFLAGS@
LDFLAGS_SHLIB = $(SANITIZER_LDFLAGS) @LDFLAGS_SHLIB@
ALL_LDFLAGS = $(LDFLAGS) @LDFLAG_DYNAMIC@
@@ -139,7 +141,7 @@ LIBFUSE = @FUSE_LIB@
LIBSUPPORT = $(LIBINTL) $(LIB)/libsupport@STATIC_LIB_EXT@
LIBBLKID = @LIBBLKID@ @PRIVATE_LIBS_CMT@ $(LIBUUID)
LIBINTL = @LIBINTL@
-SYSLIBS = @LIBS@
+SYSLIBS = @LIBS@ @PTHREAD_LIBS@
DEPLIBSS = $(LIB)/libss@LIB_EXT@
DEPLIBCOM_ERR = $(LIB)/libcom_err@LIB_EXT@
DEPLIBUUID = @DEPLIBUUID@
diff --git a/configure.ac b/configure.ac
index 6cb335d7d..d3d9ae07a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -773,6 +773,30 @@ fi
dnl
dnl
dnl
+AC_ARG_WITH([pthread],
+[ --without-pthread disable use of pthread support],
+[if test "$withval" = "no"
+then
+ try_pthread=""
+ AC_MSG_RESULT([Disabling pthread support])
+else
+ try_pthread="yes"
+ AC_MSG_RESULT([Testing for pthread support])
+fi]
+,
+try_pthread="yes"
+AC_MSG_RESULT([Try testing for pthread support by default])
+)
+if test "$try_pthread" = "yes"
+then
+AX_PTHREAD
+else
+test -n "$PTHREAD_CC" || PTHREAD_CC="$CC"
+AC_SUBST([PTHREAD_CC])
+fi
+dnl
+dnl
+dnl
AH_TEMPLATE([USE_UUIDD], [Define to 1 to build uuidd])
AC_ARG_ENABLE([uuidd],
[ --disable-uuidd disable building the uuid daemon],
--
2.28.0

2020-12-05 05:02:03

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH RFC 2/5] libext2fs: add threading support to the I/O manager abstraction

Add initial implementation support for the unix_io manager.
Applications which want to use threading should pass in
IO_FLAG_THREADS when opening the channel. Channels which support
threading (which as of this commit is unix_io and test_io if the
backing io_manager supports threading) will set the
CHANNEL_FLAGS_THREADS bit in io->flags. Library code or applications
can test if threading is enabled by checking this flag.

Applications using libext2fs can pass in EXT2_FLAG_THREADS to
ext2fs_open() or ext2fs_open2() to request threading support.

Signed-off-by: Theodore Ts'o <[email protected]>
---
lib/ext2fs/ext2_io.h | 2 +
lib/ext2fs/ext2fs.h | 1 +
lib/ext2fs/openfs.c | 2 +
lib/ext2fs/test_io.c | 6 +-
lib/ext2fs/undo_io.c | 2 +
lib/ext2fs/unix_io.c | 137 ++++++++++++++++++++++++++++++++++++++-----
6 files changed, 133 insertions(+), 17 deletions(-)

diff --git a/lib/ext2fs/ext2_io.h b/lib/ext2fs/ext2_io.h
index 5540900a5..2e0da5a53 100644
--- a/lib/ext2fs/ext2_io.h
+++ b/lib/ext2fs/ext2_io.h
@@ -33,6 +33,7 @@ typedef struct struct_io_stats *io_stats;
#define CHANNEL_FLAGS_WRITETHROUGH 0x01
#define CHANNEL_FLAGS_DISCARD_ZEROES 0x02
#define CHANNEL_FLAGS_BLOCK_DEVICE 0x04
+#define CHANNEL_FLAGS_THREADS 0x08

#define io_channel_discard_zeroes_data(i) (i->flags & CHANNEL_FLAGS_DISCARD_ZEROES)

@@ -104,6 +105,7 @@ struct struct_io_manager {
#define IO_FLAG_EXCLUSIVE 0x0002
#define IO_FLAG_DIRECT_IO 0x0004
#define IO_FLAG_FORCE_BOUNCE 0x0008
+#define IO_FLAG_THREADS 0x0010

/*
* Convenience functions....
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 69c8a3ff0..5955c3ae9 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -206,6 +206,7 @@ typedef struct ext2_file *ext2_file_t;
#define EXT2_FLAG_IGNORE_SB_ERRORS 0x800000
#define EXT2_FLAG_BBITMAP_TAIL_PROBLEM 0x1000000
#define EXT2_FLAG_IBITMAP_TAIL_PROBLEM 0x2000000
+#define EXT2_FLAG_THREADS 0x4000000

/*
* Special flag in the ext2 inode i_flag field that means that this is
diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
index 3ed1e25cd..5ec8ed5c1 100644
--- a/lib/ext2fs/openfs.c
+++ b/lib/ext2fs/openfs.c
@@ -170,6 +170,8 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
io_flags |= IO_FLAG_EXCLUSIVE;
if (flags & EXT2_FLAG_DIRECT_IO)
io_flags |= IO_FLAG_DIRECT_IO;
+ if (flags & EXT2_FLAG_THREADS)
+ io_flags |= IO_FLAG_THREADS;
retval = manager->open(fs->device_name, io_flags, &fs->io);
if (retval)
goto cleanup;
diff --git a/lib/ext2fs/test_io.c b/lib/ext2fs/test_io.c
index ee828be7a..480e68fcc 100644
--- a/lib/ext2fs/test_io.c
+++ b/lib/ext2fs/test_io.c
@@ -197,6 +197,7 @@ static errcode_t test_open(const char *name, int flags, io_channel *channel)
io->read_error = 0;
io->write_error = 0;
io->refcount = 1;
+ io->flags = 0;

memset(data, 0, sizeof(struct test_private_data));
data->magic = EXT2_ET_MAGIC_TEST_IO_CHANNEL;
@@ -237,8 +238,11 @@ static errcode_t test_open(const char *name, int flags, io_channel *channel)
if ((value = safe_getenv("TEST_IO_WRITE_ABORT")) != NULL)
data->write_abort_count = strtoul(value, NULL, 0);

- if (data->real)
+ if (data->real) {
io->align = data->real->align;
+ if (data->real->flags & CHANNEL_FLAGS_THREADS)
+ io->flags |= CHANNEL_FLAGS_THREADS;
+ }

*channel = io;
return 0;
diff --git a/lib/ext2fs/undo_io.c b/lib/ext2fs/undo_io.c
index 198624145..eb56f53d5 100644
--- a/lib/ext2fs/undo_io.c
+++ b/lib/ext2fs/undo_io.c
@@ -698,6 +698,8 @@ static errcode_t undo_open(const char *name, int flags, io_channel *channel)
int undo_fd = -1;
errcode_t retval;

+ /* We don't support multi-threading, at least for now */
+ flags &= ~IO_FLAG_THREADS;
if (name == 0)
return EXT2_ET_BAD_DEVICE_NAME;
retval = ext2fs_get_mem(sizeof(struct struct_io_channel), &io);
diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index 628e60c39..9385487d9 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -67,6 +67,9 @@
#if HAVE_LINUX_FALLOC_H
#include <linux/falloc.h>
#endif
+#ifdef HAVE_PTHREAD
+#include <pthread.h>
+#endif

#if defined(__linux__) && defined(_IO) && !defined(BLKROGET)
#define BLKROGET _IO(0x12, 94) /* Get read-only status (0 = read_write). */
@@ -107,11 +110,58 @@ struct unix_private_data {
struct unix_cache cache[CACHE_SIZE];
void *bounce;
struct struct_io_stats io_stats;
+#ifdef HAVE_PTHREAD
+ pthread_mutex_t cache_mutex;
+ pthread_mutex_t bounce_mutex;
+ pthread_mutex_t stats_mutex;
+#endif
};

#define IS_ALIGNED(n, align) ((((uintptr_t) n) & \
((uintptr_t) ((align)-1))) == 0)

+typedef enum lock_kind {
+ CACHE_MTX, BOUNCE_MTX, STATS_MTX
+} kind_t;
+
+#ifdef HAVE_PTHREAD
+static inline pthread_mutex_t *get_mutex(struct unix_private_data *data,
+ kind_t kind)
+{
+ if (data->flags & IO_FLAG_THREADS) {
+ switch (kind) {
+ case CACHE_MTX:
+ return &data->cache_mutex;
+ case BOUNCE_MTX:
+ return &data->bounce_mutex;
+ case STATS_MTX:
+ return &data->stats_mutex;
+ }
+ }
+ return NULL;
+}
+#endif
+
+static inline void mutex_lock(struct unix_private_data *data, kind_t kind)
+{
+#ifdef HAVE_PTHREAD
+ pthread_mutex_t *mtx = get_mutex(data,kind);
+
+ if (mtx)
+ pthread_mutex_lock(mtx);
+#endif
+}
+
+static inline void mutex_unlock(struct unix_private_data *data, kind_t kind)
+{
+#ifdef HAVE_PTHREAD
+ pthread_mutex_t *mtx = get_mutex(data,kind);
+
+ if (mtx)
+ pthread_mutex_unlock(mtx);
+#endif
+}
+
static errcode_t unix_get_stats(io_channel channel, io_stats *stats)
{
errcode_t retval = 0;
@@ -122,8 +172,11 @@ static errcode_t unix_get_stats(io_channel channel, io_stats *stats)
data = (struct unix_private_data *) channel->private_data;
EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);

- if (stats)
+ if (stats) {
+ mutex_lock(data, STATS_MTX);
*stats = &data->io_stats;
+ mutex_unlock(data, STATS_MTX);
+ }

return retval;
}
@@ -167,7 +220,9 @@ static errcode_t raw_read_blk(io_channel channel,
ssize_t really_read = 0;

size = (count < 0) ? -count : (ext2_loff_t) count * channel->block_size;
+ mutex_lock(data, STATS_MTX);
data->io_stats.bytes_read += size;
+ mutex_unlock(data, STATS_MTX);
location = ((ext2_loff_t) block * channel->block_size) + data->offset;

if (data->flags & IO_FLAG_FORCE_BOUNCE) {
@@ -232,8 +287,10 @@ static errcode_t raw_read_blk(io_channel channel,
*/
bounce_read:
while (size > 0) {
+ mutex_lock(data, BOUNCE_MTX);
actual = read(data->dev, data->bounce, channel->block_size);
if (actual != channel->block_size) {
+ mutex_unlock(data, BOUNCE_MTX);
actual = really_read;
buf -= really_read;
size += really_read;
@@ -246,6 +303,7 @@ bounce_read:
really_read += actual;
size -= actual;
buf += actual;
+ mutex_unlock(data, BOUNCE_MTX);
}
return 0;

@@ -277,7 +335,9 @@ static errcode_t raw_write_blk(io_channel channel,
else
size = (ext2_loff_t) count * channel->block_size;
}
+ mutex_lock(data, STATS_MTX);
data->io_stats.bytes_written += size;
+ mutex_unlock(data, STATS_MTX);

location = ((ext2_loff_t) block * channel->block_size) + data->offset;

@@ -341,11 +401,13 @@ static errcode_t raw_write_blk(io_channel channel,
*/
bounce_write:
while (size > 0) {
+ mutex_lock(data, BOUNCE_MTX);
if (size < channel->block_size) {
actual = read(data->dev, data->bounce,
channel->block_size);
if (actual != channel->block_size) {
if (actual < 0) {
+ mutex_unlock(data, BOUNCE_MTX);
retval = errno;
goto error_out;
}
@@ -362,6 +424,7 @@ bounce_write:
goto error_out;
}
actual = write(data->dev, data->bounce, channel->block_size);
+ mutex_unlock(data, BOUNCE_MTX);
if (actual < 0) {
retval = errno;
goto error_out;
@@ -481,24 +544,28 @@ static void reuse_cache(io_channel channel, struct unix_private_data *data,
cache->access_time = ++data->access_time;
}

+#define FLUSH_INVALIDATE 0x01
+#define FLUSH_NOLOCK 0x02
+
/*
* Flush all of the blocks in the cache
*/
static errcode_t flush_cached_blocks(io_channel channel,
struct unix_private_data *data,
- int invalidate)
-
+ int flags)
{
struct unix_cache *cache;
errcode_t retval, retval2;
int i;

retval2 = 0;
+ if ((flags & FLUSH_NOLOCK) == 0)
+ mutex_lock(data, CACHE_MTX);
for (i=0, cache = data->cache; i < CACHE_SIZE; i++, cache++) {
if (!cache->in_use)
continue;

- if (invalidate)
+ if (flags & FLUSH_INVALIDATE)
cache->in_use = 0;

if (!cache->dirty)
@@ -511,6 +578,8 @@ static errcode_t flush_cached_blocks(io_channel channel,
else
cache->dirty = 0;
}
+ if ((flags & FLUSH_NOLOCK) == 0)
+ mutex_unlock(data, CACHE_MTX);
return retval2;
}
#endif /* NO_IO_CACHE */
@@ -597,6 +666,7 @@ static errcode_t unix_open_channel(const char *name, int fd,
io->read_error = 0;
io->write_error = 0;
io->refcount = 1;
+ io->flags = 0;

memset(data, 0, sizeof(struct unix_private_data));
data->magic = EXT2_ET_MAGIC_UNIX_IO_CHANNEL;
@@ -703,6 +773,25 @@ static errcode_t unix_open_channel(const char *name, int fd,
setrlimit(RLIMIT_FSIZE, &rlim);
}
}
+#endif
+#ifdef HAVE_PTHREAD
+ if (flags & IO_FLAG_THREADS) {
+ io->flags |= CHANNEL_FLAGS_THREADS;
+ retval = pthread_mutex_init(&data->cache_mutex, NULL);
+ if (retval)
+ goto cleanup;
+ retval = pthread_mutex_init(&data->bounce_mutex, NULL);
+ if (retval) {
+ pthread_mutex_destroy(&data->cache_mutex);
+ goto cleanup;
+ }
+ retval = pthread_mutex_init(&data->stats_mutex, NULL);
+ if (retval) {
+ pthread_mutex_destroy(&data->cache_mutex);
+ pthread_mutex_destroy(&data->bounce_mutex);
+ goto cleanup;
+ }
+ }
#endif
*channel = io;
return 0;
@@ -796,6 +885,13 @@ static errcode_t unix_close(io_channel channel)
if (close(data->dev) < 0)
retval = errno;
free_cache(data);
+#ifdef HAVE_PTHREAD
+ if (data->flags & IO_FLAG_THREADS) {
+ pthread_mutex_destroy(&data->cache_mutex);
+ pthread_mutex_destroy(&data->bounce_mutex);
+ pthread_mutex_destroy(&data->stats_mutex);
+ }
+#endif

ext2fs_free_mem(&channel->private_data);
if (channel->name)
@@ -807,24 +903,27 @@ static errcode_t unix_close(io_channel channel)
static errcode_t unix_set_blksize(io_channel channel, int blksize)
{
struct unix_private_data *data;
- errcode_t retval;
+ errcode_t retval = 0;

EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
data = (struct unix_private_data *) channel->private_data;
EXT2_CHECK_MAGIC(data, EXT2_ET_MAGIC_UNIX_IO_CHANNEL);

if (channel->block_size != blksize) {
+ mutex_lock(data, CACHE_MTX);
+ mutex_lock(data, BOUNCE_MTX);
#ifndef NO_IO_CACHE
- if ((retval = flush_cached_blocks(channel, data, 0)))
+ if ((retval = flush_cached_blocks(channel, data, FLUSH_NOLOCK)))
return retval;
#endif

channel->block_size = blksize;
free_cache(data);
- if ((retval = alloc_cache(channel, data)))
- return retval;
+ retval = alloc_cache(channel, data);
+ mutex_unlock(data, BOUNCE_MTX);
+ mutex_unlock(data, CACHE_MTX);
}
- return 0;
+ return retval;
}

static errcode_t unix_read_blk64(io_channel channel, unsigned long long block,
@@ -832,7 +931,7 @@ static errcode_t unix_read_blk64(io_channel channel, unsigned long long block,
{
struct unix_private_data *data;
struct unix_cache *cache, *reuse[READ_DIRECT_SIZE];
- errcode_t retval;
+ errcode_t retval = 0;
char *cp;
int i, j;

@@ -854,6 +953,7 @@ static errcode_t unix_read_blk64(io_channel channel, unsigned long long block,
}

cp = buf;
+ mutex_lock(data, CACHE_MTX);
while (count > 0) {
/* If it's in the cache, use it! */
if ((cache = find_cached_block(data, block, &reuse[0]))) {
@@ -876,10 +976,11 @@ static errcode_t unix_read_blk64(io_channel channel, unsigned long long block,
if ((retval = raw_read_blk(channel, data, block, 1,
cache->buf))) {
cache->in_use = 0;
- return retval;
+ break;
}
memcpy(cp, cache->buf, channel->block_size);
- return 0;
+ retval = 0;
+ break;
}

/*
@@ -893,7 +994,7 @@ static errcode_t unix_read_blk64(io_channel channel, unsigned long long block,
printf("Reading %d blocks starting at %lu\n", i, block);
#endif
if ((retval = raw_read_blk(channel, data, block, i, cp)))
- return retval;
+ break;

/* Save the results in the cache */
for (j=0; j < i; j++) {
@@ -904,7 +1005,8 @@ static errcode_t unix_read_blk64(io_channel channel, unsigned long long block,
cp += channel->block_size;
}
}
- return 0;
+ mutex_unlock(data, CACHE_MTX);
+ return retval;
#endif /* NO_IO_CACHE */
}

@@ -935,7 +1037,8 @@ static errcode_t unix_write_blk64(io_channel channel, unsigned long long block,
* flush out the cache completely and then do a direct write.
*/
if (count < 0 || count > WRITE_DIRECT_SIZE) {
- if ((retval = flush_cached_blocks(channel, data, 1)))
+ if ((retval = flush_cached_blocks(channel, data,
+ FLUSH_INVALIDATE)))
return retval;
return raw_write_blk(channel, data, block, count, buf);
}
@@ -950,6 +1053,7 @@ static errcode_t unix_write_blk64(io_channel channel, unsigned long long block,
retval = raw_write_blk(channel, data, block, count, buf);

cp = buf;
+ mutex_lock(data, CACHE_MTX);
while (count > 0) {
cache = find_cached_block(data, block, &reuse);
if (!cache) {
@@ -963,6 +1067,7 @@ static errcode_t unix_write_blk64(io_channel channel, unsigned long long block,
block++;
cp += channel->block_size;
}
+ mutex_unlock(data, CACHE_MTX);
return retval;
#endif /* NO_IO_CACHE */
}
@@ -1013,7 +1118,7 @@ static errcode_t unix_write_byte(io_channel channel, unsigned long offset,
/*
* Flush out the cache completely
*/
- if ((retval = flush_cached_blocks(channel, data, 1)))
+ if ((retval = flush_cached_blocks(channel, data, FLUSH_INVALIDATE)))
return retval;
#endif

--
2.28.0

2020-12-05 05:02:03

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH RFC 3/5] libext2fs: allow the unix_io manager's cache to be disabled and re-enabled

Signed-off-by: Theodore Ts'o <[email protected]>
---
lib/ext2fs/ext2_io.h | 1 +
lib/ext2fs/unix_io.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)

diff --git a/lib/ext2fs/ext2_io.h b/lib/ext2fs/ext2_io.h
index 2e0da5a53..95c25a7b1 100644
--- a/lib/ext2fs/ext2_io.h
+++ b/lib/ext2fs/ext2_io.h
@@ -106,6 +106,7 @@ struct struct_io_manager {
#define IO_FLAG_DIRECT_IO 0x0004
#define IO_FLAG_FORCE_BOUNCE 0x0008
#define IO_FLAG_THREADS 0x0010
+#define IO_FLAG_NOCACHE 0x0020

/*
* Convenience functions....
diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index 9385487d9..2df53e51c 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -942,6 +942,8 @@ static errcode_t unix_read_blk64(io_channel channel, unsigned long long block,
#ifdef NO_IO_CACHE
return raw_read_blk(channel, data, block, count, buf);
#else
+ if (data->flags & IO_FLAG_NOCACHE)
+ return raw_read_blk(channel, data, block, count, buf);
/*
* If we're doing an odd-sized read or a very large read,
* flush out the cache and then do a direct read.
@@ -1032,6 +1034,8 @@ static errcode_t unix_write_blk64(io_channel channel, unsigned long long block,
#ifdef NO_IO_CACHE
return raw_write_blk(channel, data, block, count, buf);
#else
+ if (data->flags & IO_FLAG_NOCACHE)
+ return raw_write_blk(channel, data, block, count, buf);
/*
* If we're doing an odd-sized write or a very large write,
* flush out the cache completely and then do a direct write.
@@ -1161,6 +1165,7 @@ static errcode_t unix_set_option(io_channel channel, const char *option,
{
struct unix_private_data *data;
unsigned long long tmp;
+ errcode_t retval;
char *end;

EXT2_CHECK_MAGIC(channel, EXT2_ET_MAGIC_IO_CHANNEL);
@@ -1179,6 +1184,20 @@ static errcode_t unix_set_option(io_channel channel, const char *option,
return EXT2_ET_INVALID_ARGUMENT;
return 0;
}
+ if (!strcmp(option, "cache")) {
+ if (!arg)
+ return EXT2_ET_INVALID_ARGUMENT;
+ if (!strcmp(arg, "on")) {
+ data->flags &= ~IO_FLAG_NOCACHE;
+ return 0;
+ }
+ if (!strcmp(arg, "off")) {
+ retval = flush_cached_blocks(channel, data, 0);
+ data->flags |= IO_FLAG_NOCACHE;
+ return retval;
+ }
+ return EXT2_ET_INVALID_ARGUMENT;
+ }
return EXT2_ET_INVALID_ARGUMENT;
}

--
2.28.0

2020-12-05 05:02:03

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH RFC 4/5] ext2fs: parallel bitmap loading

From: Wang Shilong <[email protected]>

In our benchmarking for PiB size filesystem, pass5 takes
10446s to finish and 99.5% of time takes on reading bitmaps.

It makes sense to reading bitmaps using multiple threads,
a quickly benchmark show 10446s to 626s with 64 threads.

[ This has all of many bug fixes for rw_bitmaps.c from the original
luster patch set collapsed into a single commit. In addition it has
the new ext2fs_rw_bitmaps() api proposed by Ted. ]

Signed-off-by: Wang Shilong <[email protected]>
Signed-off-by: Saranya Muruganandam <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
---
lib/ext2fs/ext2fs.h | 8 +
lib/ext2fs/rw_bitmaps.c | 323 +++++++++++++++++++++++++++++++++-------
2 files changed, 279 insertions(+), 52 deletions(-)

diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 5955c3ae9..82ce91264 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -688,6 +688,14 @@ struct ext2_xattr_handle;
#define XATTR_ABORT 1
#define XATTR_CHANGED 2

+/*
+ * flags for ext2fs_rw_bitmaps()
+ */
+#define EXT2FS_BITMAPS_WRITE 0x0001
+#define EXT2FS_BITMAPS_BLOCK 0x0002
+#define EXT2FS_BITMAPS_INODE 0x0004
+#define EXT2FS_BITMAPS_VALID_FLAGS 0x0007
+
/*
* function prototypes
*/
diff --git a/lib/ext2fs/rw_bitmaps.c b/lib/ext2fs/rw_bitmaps.c
index d80c9eb8f..9d5456578 100644
--- a/lib/ext2fs/rw_bitmaps.c
+++ b/lib/ext2fs/rw_bitmaps.c
@@ -23,6 +23,9 @@
#ifdef HAVE_SYS_TYPES_H
#include <sys/types.h>
#endif
+#ifdef HAVE_PTHREAD_H
+#include <pthread.h>
+#endif

#include "ext2_fs.h"
#include "ext2fs.h"
@@ -205,22 +208,12 @@ static int bitmap_tail_verify(unsigned char *bitmap, int first, int last)
return 1;
}

-static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, int do_block)
+static errcode_t read_bitmaps_range_prepare(ext2_filsys fs, int flags)
{
- dgrp_t i;
- char *block_bitmap = 0, *inode_bitmap = 0;
- char *buf;
errcode_t retval;
int block_nbytes = EXT2_CLUSTERS_PER_GROUP(fs->super) / 8;
int inode_nbytes = EXT2_INODES_PER_GROUP(fs->super) / 8;
- int tail_flags = 0;
- int csum_flag;
- unsigned int cnt;
- blk64_t blk;
- blk64_t blk_itr = EXT2FS_B2C(fs, fs->super->s_first_data_block);
- blk64_t blk_cnt;
- ext2_ino_t ino_itr = 1;
- ext2_ino_t ino_cnt;
+ char *buf;

EXT2_CHECK_MAGIC(fs, EXT2_ET_MAGIC_EXT2FS_FILSYS);

@@ -230,12 +223,11 @@ static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, int do_block)

fs->write_bitmaps = ext2fs_write_bitmaps;

- csum_flag = ext2fs_has_group_desc_csum(fs);
-
retval = ext2fs_get_mem(strlen(fs->device_name) + 80, &buf);
if (retval)
return retval;
- if (do_block) {
+
+ if (flags & EXT2FS_BITMAPS_BLOCK) {
if (fs->block_map)
ext2fs_free_block_bitmap(fs->block_map);
strcpy(buf, "block bitmap for ");
@@ -243,12 +235,9 @@ static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, int do_block)
retval = ext2fs_allocate_block_bitmap(fs, buf, &fs->block_map);
if (retval)
goto cleanup;
- retval = io_channel_alloc_buf(fs->io, 0, &block_bitmap);
- if (retval)
- goto cleanup;
- } else
- block_nbytes = 0;
- if (do_inode) {
+ }
+
+ if (flags & EXT2FS_BITMAPS_INODE) {
if (fs->inode_map)
ext2fs_free_inode_bitmap(fs->inode_map);
strcpy(buf, "inode bitmap for ");
@@ -256,13 +245,62 @@ static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, int do_block)
retval = ext2fs_allocate_inode_bitmap(fs, buf, &fs->inode_map);
if (retval)
goto cleanup;
+ }
+ ext2fs_free_mem(&buf);
+
+ return retval;
+
+cleanup:
+ if (flags & EXT2FS_BITMAPS_BLOCK) {
+ ext2fs_free_block_bitmap(fs->block_map);
+ fs->block_map = 0;
+ }
+ if (flags & EXT2FS_BITMAPS_INODE) {
+ ext2fs_free_inode_bitmap(fs->inode_map);
+ fs->inode_map = 0;
+ }
+ if (buf)
+ ext2fs_free_mem(&buf);
+ return retval;
+}
+
+static errcode_t read_bitmaps_range_start(ext2_filsys fs, int flags,
+ dgrp_t start, dgrp_t end,
+ pthread_mutex_t *mutex,
+ int *tail_flags)
+{
+ dgrp_t i;
+ char *block_bitmap = 0, *inode_bitmap = 0;
+ errcode_t retval = 0;
+ int block_nbytes = EXT2_CLUSTERS_PER_GROUP(fs->super) / 8;
+ int inode_nbytes = EXT2_INODES_PER_GROUP(fs->super) / 8;
+ int csum_flag;
+ unsigned int cnt;
+ blk64_t blk;
+ blk64_t blk_itr = EXT2FS_B2C(fs, fs->super->s_first_data_block);
+ blk64_t blk_cnt;
+ ext2_ino_t ino_itr = 1;
+ ext2_ino_t ino_cnt;
+
+ csum_flag = ext2fs_has_group_desc_csum(fs);
+
+ if (flags & EXT2FS_BITMAPS_BLOCK) {
+ retval = io_channel_alloc_buf(fs->io, 0, &block_bitmap);
+ if (retval)
+ goto cleanup;
+ } else {
+ block_nbytes = 0;
+ }
+
+ if (flags & EXT2FS_BITMAPS_INODE) {
retval = io_channel_alloc_buf(fs->io, 0, &inode_bitmap);
if (retval)
goto cleanup;
- } else
+ } else {
inode_nbytes = 0;
- ext2fs_free_mem(&buf);
+ }

+ /* io should be null */
if (fs->flags & EXT2_FLAG_IMAGE_FILE) {
blk = (ext2fs_le32_to_cpu(fs->image_header->offset_inodemap) / fs->blocksize);
ino_cnt = fs->super->s_inodes_count;
@@ -300,10 +338,12 @@ static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, int do_block)
blk_itr += cnt;
blk_cnt -= cnt;
}
- goto success_cleanup;
+ goto cleanup;
}

- for (i = 0; i < fs->group_desc_count; i++) {
+ blk_itr += ((blk64_t)start * (block_nbytes << 3));
+ ino_itr += ((blk64_t)start * (inode_nbytes << 3));
+ for (i = start; i <= end; i++) {
if (block_bitmap) {
blk = ext2fs_block_bitmap_loc(fs, i);
if ((csum_flag &&
@@ -329,12 +369,20 @@ static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, int do_block)
}
if (!bitmap_tail_verify((unsigned char *) block_bitmap,
block_nbytes, fs->blocksize - 1))
- tail_flags |= EXT2_FLAG_BBITMAP_TAIL_PROBLEM;
+ *tail_flags |= EXT2_FLAG_BBITMAP_TAIL_PROBLEM;
} else
memset(block_bitmap, 0, block_nbytes);
cnt = block_nbytes << 3;
+#ifdef HAVE_PTHREAD
+ if (mutex)
+ pthread_mutex_lock(mutex);
+#endif
retval = ext2fs_set_block_bitmap_range2(fs->block_map,
blk_itr, cnt, block_bitmap);
+#ifdef HAVE_PTHREAD
+ if (mutex)
+ pthread_mutex_unlock(mutex);
+#endif
if (retval)
goto cleanup;
blk_itr += block_nbytes << 3;
@@ -365,63 +413,229 @@ static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, int do_block)
}
if (!bitmap_tail_verify((unsigned char *) inode_bitmap,
inode_nbytes, fs->blocksize - 1))
- tail_flags |= EXT2_FLAG_IBITMAP_TAIL_PROBLEM;
+ *tail_flags |= EXT2_FLAG_IBITMAP_TAIL_PROBLEM;
} else
memset(inode_bitmap, 0, inode_nbytes);
cnt = inode_nbytes << 3;
+ if (mutex)
+ pthread_mutex_lock(mutex);
retval = ext2fs_set_inode_bitmap_range2(fs->inode_map,
ino_itr, cnt, inode_bitmap);
+ if (mutex)
+ pthread_mutex_unlock(mutex);
if (retval)
goto cleanup;
ino_itr += inode_nbytes << 3;
}
}

+cleanup:
+ if (inode_bitmap)
+ ext2fs_free_mem(&inode_bitmap);
+ if (block_bitmap)
+ ext2fs_free_mem(&block_bitmap);
+ return retval;
+}
+
+static errcode_t read_bitmaps_range_end(ext2_filsys fs, int flags,
+ int tail_flags)
+{
+ errcode_t retval;
+
/* Mark group blocks for any BLOCK_UNINIT groups */
- if (do_block) {
+ if (flags & EXT2FS_BITMAPS_BLOCK) {
retval = mark_uninit_bg_group_blocks(fs);
if (retval)
- goto cleanup;
- }
-
-success_cleanup:
- if (inode_bitmap) {
- ext2fs_free_mem(&inode_bitmap);
- fs->flags &= ~EXT2_FLAG_IBITMAP_TAIL_PROBLEM;
- }
- if (block_bitmap) {
- ext2fs_free_mem(&block_bitmap);
+ return retval;
fs->flags &= ~EXT2_FLAG_BBITMAP_TAIL_PROBLEM;
}
+ if (flags & EXT2FS_BITMAPS_INODE)
+ fs->flags &= ~EXT2_FLAG_IBITMAP_TAIL_PROBLEM;
fs->flags |= tail_flags;
+
return 0;
+}

-cleanup:
- if (do_block) {
+static void read_bitmaps_cleanup_on_error(ext2_filsys fs, int flags)
+{
+ if (flags & EXT2FS_BITMAPS_BLOCK) {
ext2fs_free_block_bitmap(fs->block_map);
fs->block_map = 0;
}
- if (do_inode) {
+ if (flags & EXT2FS_BITMAPS_INODE) {
ext2fs_free_inode_bitmap(fs->inode_map);
fs->inode_map = 0;
}
- if (inode_bitmap)
- ext2fs_free_mem(&inode_bitmap);
- if (block_bitmap)
- ext2fs_free_mem(&block_bitmap);
- if (buf)
- ext2fs_free_mem(&buf);
+}
+
+static errcode_t read_bitmaps_range(ext2_filsys fs, int flags,
+ dgrp_t start, dgrp_t end)
+{
+ errcode_t retval;
+ int tail_flags = 0;
+
+ retval = read_bitmaps_range_prepare(fs, flags);
+ if (retval)
+ return retval;
+
+ retval = read_bitmaps_range_start(fs, flags, start, end,
+ NULL, &tail_flags);
+ if (retval == 0)
+ retval = read_bitmaps_range_end(fs, flags, tail_flags);
+ if (retval)
+ read_bitmaps_cleanup_on_error(fs, flags);
+ return retval;
+}
+
+#ifdef HAVE_PTHREAD
+struct read_bitmaps_thread_info {
+ ext2_filsys rbt_fs;
+ int rbt_flags;
+ dgrp_t rbt_grp_start;
+ dgrp_t rbt_grp_end;
+ errcode_t rbt_retval;
+ pthread_mutex_t *rbt_mutex;
+ int rbt_tail_flags;
+};
+
+static void *read_bitmaps_thread(void *data)
+{
+ struct read_bitmaps_thread_info *rbt = data;
+
+ rbt->rbt_retval = read_bitmaps_range_start(rbt->rbt_fs, rbt->rbt_flags,
+ rbt->rbt_grp_start, rbt->rbt_grp_end,
+ rbt->rbt_mutex, &rbt->rbt_tail_flags);
+ return NULL;
+}
+#endif
+
+errcode_t ext2fs_rw_bitmaps(ext2_filsys fs, int flags, int num_threads)
+{
+#ifdef HAVE_PTHREAD
+ pthread_attr_t attr;
+ pthread_t *thread_ids = NULL;
+ struct read_bitmaps_thread_info *thread_infos = NULL;
+ pthread_mutex_t rbt_mutex = PTHREAD_MUTEX_INITIALIZER;
+ errcode_t retval;
+ errcode_t rc;
+ unsigned flexbg_size = 1 << fs->super->s_log_groups_per_flex;
+ dgrp_t average_group;
+ int i, tail_flags = 0;
+ io_manager manager = unix_io_manager;
+#endif
+
+ if (flags & ~EXT2FS_BITMAPS_VALID_FLAGS)
+ return EXT2_ET_INVALID_ARGUMENT;
+
+ if (flags & EXT2FS_BITMAPS_WRITE)
+ return write_bitmaps(fs, flags & EXT2FS_BITMAPS_INODE,
+ flags & EXT2FS_BITMAPS_BLOCK);
+
+#ifdef HAVE_PTHREAD
+ if (((fs->io->flags & CHANNEL_FLAGS_THREADS) == 0) ||
+ (num_threads == 1) || (fs->flags & EXT2_FLAG_IMAGE_FILE))
+ goto fallback;
+
+ if (num_threads < 0) {
+#if defined(HAVE_SYSCONF) && defined(_SC_NPROCESSORS_CONF)
+ num_threads = sysconf(_SC_NPROCESSORS_CONF);
+#else
+ /*
+ * Guess for now; eventually we should probably define
+ * ext2fs_get_num_cpus() and teach it how to get this info on
+ * MacOS, FreeBSD, etc.
+ * ref: https://stackoverflow.com/questions/150355
+ */
+ num_threads = 4;
+#endif
+ if (num_threads <= 1)
+ goto fallback;
+ }
+ if (num_threads > fs->group_desc_count)
+ num_threads = fs->group_desc_count;
+ average_group = fs->group_desc_count / num_threads;
+ if (ext2fs_has_feature_flex_bg(fs->super)) {
+ average_group = (average_group / flexbg_size) * flexbg_size;
+ }
+ if (average_group == 0)
+ goto fallback;
+
+ io_channel_set_options(fs->io, "cache=off");
+ retval = pthread_attr_init(&attr);
+ if (retval)
+ return retval;
+
+ thread_ids = calloc(sizeof(pthread_t), num_threads);
+ if (!thread_ids)
+ return -ENOMEM;
+
+ thread_infos = calloc(sizeof(struct read_bitmaps_thread_info),
+ num_threads);
+ if (!thread_infos)
+ goto out;
+
+ retval = read_bitmaps_range_prepare(fs, flags);
+ if (retval)
+ goto out;
+
+// fprintf(stdout, "Multiple threads triggered to read bitmaps\n");
+ for (i = 0; i < num_threads; i++) {
+ thread_infos[i].rbt_fs = fs;
+ thread_infos[i].rbt_flags = flags;
+ thread_infos[i].rbt_mutex = &rbt_mutex;
+ thread_infos[i].rbt_tail_flags = 0;
+ if (i == 0)
+ thread_infos[i].rbt_grp_start = 0;
+ else
+ thread_infos[i].rbt_grp_start = average_group * i + 1;
+
+ if (i == num_threads - 1)
+ thread_infos[i].rbt_grp_end = fs->group_desc_count - 1;
+ else
+ thread_infos[i].rbt_grp_end = average_group * (i + 1);
+ retval = pthread_create(&thread_ids[i], &attr,
+ &read_bitmaps_thread, &thread_infos[i]);
+ if (retval)
+ break;
+ }
+ for (i = 0; i < num_threads; i++) {
+ if (!thread_ids[i])
+ break;
+ rc = pthread_join(thread_ids[i], NULL);
+ if (rc && !retval)
+ retval = rc;
+ rc = thread_infos[i].rbt_retval;
+ if (rc && !retval)
+ retval = rc;
+ tail_flags |= thread_infos[i].rbt_tail_flags;
+ }
+out:
+ rc = pthread_attr_destroy(&attr);
+ if (rc && !retval)
+ retval = rc;
+ free(thread_infos);
+ free(thread_ids);
+
+ if (retval == 0)
+ retval = read_bitmaps_range_end(fs, flags, tail_flags);
+ if (retval)
+ read_bitmaps_cleanup_on_error(fs, flags);
+ /* XXX should save and restore cache setting */
+ io_channel_set_options(fs->io, "cache=on");
return retval;
+fallback:
+#endif
+ return read_bitmaps_range(fs, flags, 0, fs->group_desc_count - 1);
}

errcode_t ext2fs_read_inode_bitmap(ext2_filsys fs)
{
- return read_bitmaps(fs, 1, 0);
+ return ext2fs_rw_bitmaps(fs, EXT2FS_BITMAPS_INODE, -1);
}

errcode_t ext2fs_read_block_bitmap(ext2_filsys fs)
{
- return read_bitmaps(fs, 0, 1);
+ return ext2fs_rw_bitmaps(fs, EXT2FS_BITMAPS_BLOCK, -1);
}

errcode_t ext2fs_write_inode_bitmap(ext2_filsys fs)
@@ -436,10 +650,15 @@ errcode_t ext2fs_write_block_bitmap (ext2_filsys fs)

errcode_t ext2fs_read_bitmaps(ext2_filsys fs)
{
- if (fs->inode_map && fs->block_map)
- return 0;
+ int flags = 0;

- return read_bitmaps(fs, !fs->inode_map, !fs->block_map);
+ if (!fs->inode_map)
+ flags |= EXT2FS_BITMAPS_INODE;
+ if (!fs->block_map)
+ flags |= EXT2FS_BITMAPS_BLOCK;
+ if (flags == 0)
+ return 0;
+ return ext2fs_rw_bitmaps(fs, flags, -1);
}

errcode_t ext2fs_write_bitmaps(ext2_filsys fs)
--
2.28.0

2020-12-07 18:18:24

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH RFC 2/5] libext2fs: add threading support to the I/O manager abstraction

On Dec 4, 2020, at 9:58 PM, Theodore Ts'o <[email protected]> wrote:
>
> Add initial implementation support for the unix_io manager.
> Applications which want to use threading should pass in
> IO_FLAG_THREADS when opening the channel. Channels which support
> threading (which as of this commit is unix_io and test_io if the
> backing io_manager supports threading) will set the
> CHANNEL_FLAGS_THREADS bit in io->flags. Library code or applications
> can test if threading is enabled by checking this flag.
>
> Applications using libext2fs can pass in EXT2_FLAG_THREADS to
> ext2fs_open() or ext2fs_open2() to request threading support.
>
> Signed-off-by: Theodore Ts'o <[email protected]>
> ---
>
> diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
> index 628e60c39..9385487d9 100644
> --- a/lib/ext2fs/unix_io.c
> +++ b/lib/ext2fs/unix_io.c
> @@ -232,8 +287,10 @@ static errcode_t raw_read_blk(io_channel channel,
> bounce_read:
> while (size > 0) {
> + mutex_lock(data, BOUNCE_MTX);
> actual = read(data->dev, data->bounce, channel->block_size);
> if (actual != channel->block_size) {
> + mutex_unlock(data, BOUNCE_MTX);
> actual = really_read;
> buf -= really_read;
> size += really_read;
> @@ -246,6 +303,7 @@ bounce_read:
> really_read += actual;
> size -= actual;
> buf += actual;
> + mutex_unlock(data, BOUNCE_MTX);
> }
> return 0;

Do you know how often we get into the "bounce_read" IO path? It seems like
locking around the read would kill parallelism, but this code path also
looks like a fallback, but maybe 100% used for blocksize != PAGE_SIZE?

> @@ -341,11 +401,13 @@ static errcode_t raw_write_blk(io_channel channel,
> */
> bounce_write:
> while (size > 0) {
> + mutex_lock(data, BOUNCE_MTX);
> if (size < channel->block_size) {
> actual = read(data->dev, data->bounce,
> channel->block_size);
> if (actual != channel->block_size) {
> if (actual < 0) {
> + mutex_unlock(data, BOUNCE_MTX);
> retval = errno;
> goto error_out;
> }
> @@ -362,6 +424,7 @@ bounce_write:
> goto error_out;
> }
> actual = write(data->dev, data->bounce, channel->block_size);
> + mutex_unlock(data, BOUNCE_MTX);
> if (actual < 0) {
> retval = errno;
> goto error_out;

Ditto.

> @@ -703,6 +773,25 @@ static errcode_t unix_open_channel(const char *name, int fd,
> setrlimit(RLIMIT_FSIZE, &rlim);
> }
> }
> +#endif
> +#ifdef HAVE_PTHREAD
> + if (flags & IO_FLAG_THREADS) {
> + io->flags |= CHANNEL_FLAGS_THREADS;
> + retval = pthread_mutex_init(&data->cache_mutex, NULL);
> + if (retval)
> + goto cleanup;
> + retval = pthread_mutex_init(&data->bounce_mutex, NULL);
> + if (retval) {
> + pthread_mutex_destroy(&data->cache_mutex);
> + goto cleanup;
> + }
> + retval = pthread_mutex_init(&data->stats_mutex, NULL);
> + if (retval) {
> + pthread_mutex_destroy(&data->cache_mutex);
> + pthread_mutex_destroy(&data->bounce_mutex);
> + goto cleanup;
> + }
> + }
> #endif

At one point you talked about using dlopen() or similar to link in the
pthread library only if it is actually needed? Or is the linkage of
the pthread library avoided by these functions not being called unless
IO_FLAG_THREADS is set?

Cheers, Andreas






Attachments:
signature.asc (890.00 B)
Message signed with OpenPGP

2020-12-08 01:27:58

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH RFC 2/5] libext2fs: add threading support to the I/O manager abstraction

On Mon, Dec 07, 2020 at 11:15:30AM -0700, Andreas Dilger wrote:
>
> Do you know how often we get into the "bounce_read" IO path? It seems like
> locking around the read would kill parallelism, but this code path also
> looks like a fallback, but maybe 100% used for blocksize != PAGE_SIZE?

It should be extremely rare. It only happens when Direct I/O is
requested (which is rare to begin with, although it looks like there
are people who are playing with some out of tree patchets to force
e2image to use Direct I/O?), and the offset or the size of I/O isn't
aligned with the system's direct I/O alignment requirements. (See
ext2fs_get_dio_alignment() in lib/ext2fs/getsectsize.c)

> At one point you talked about using dlopen() or similar to link in the
> pthread library only if it is actually needed? Or is the linkage of
> the pthread library avoided by these functions not being called unless
> IO_FLAG_THREADS is set?

So what I'm doing is just not trying to call those functions unless
threading is required (e.g., IO_FLAG_THREADS is set, which would imply
that EXT2_FLAG_THREADS was passed to ext2fs_open()). This won't help
if the application is using static linking, but if the application is
compiled statically, dlopen(3) is not guaranteed to work in any case.

So it's not perfect, and there may be some ancient AIX machine for
which this might be problematic. But for all modern OS's that are
linking to want to compile e2fsprogs, it should work. And I don't
have access to an AIX machine these days, and I don't work for IBM
anymore, so.... ¯\_(ツ)_/¯

:-)

- Ted

2020-12-11 10:30:44

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH RFC 4/5] ext2fs: parallel bitmap loading

On Dec 4, 2020, at 9:58 PM, Theodore Ts'o <[email protected]> wrote:
>
> From: Wang Shilong <[email protected]>
>
> In our benchmarking for PiB size filesystem, pass5 takes
> 10446s to finish and 99.5% of time takes on reading bitmaps.
>
> It makes sense to reading bitmaps using multiple threads,
> a quickly benchmark show 10446s to 626s with 64 threads.
>
> [ This has all of many bug fixes for rw_bitmaps.c from the original
> luster patch set collapsed into a single commit. In addition it has
> the new ext2fs_rw_bitmaps() api proposed by Ted. ]
>
> Signed-off-by: Wang Shilong <[email protected]>
> Signed-off-by: Saranya Muruganandam <[email protected]>
> Signed-off-by: Theodore Ts'o <[email protected]>

The patch looks generally good. Unfortunately, I don't have a large
system available to verify the performance at this time, but it looks
close enough to the original code that I don't think there is much
risk of breakage.

One potential cross-platform build issue below, and some cosmetic
suggestions, but you could add:

Reviewed-by: Andreas Dilger <[email protected]>

> @@ -329,12 +369,20 @@ static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, int do_block)
> }
> if (!bitmap_tail_verify((unsigned char *) block_bitmap,
> block_nbytes, fs->blocksize - 1))
> - tail_flags |= EXT2_FLAG_BBITMAP_TAIL_PROBLEM;
> + *tail_flags |= EXT2_FLAG_BBITMAP_TAIL_PROBLEM;
> } else
> memset(block_bitmap, 0, block_nbytes);
> cnt = block_nbytes << 3;
> +#ifdef HAVE_PTHREAD
> + if (mutex)
> + pthread_mutex_lock(mutex);
> +#endif
> retval = ext2fs_set_block_bitmap_range2(fs->block_map,
> blk_itr, cnt, block_bitmap);
> +#ifdef HAVE_PTHREAD
> + if (mutex)
> + pthread_mutex_unlock(mutex);
> +#endif

(style) It wouldn't be terrible to have wrappers around these functions
instead of inline #ifdef in the few places they are used, like:

#ifdef HAVE_PTHREAD
static void unix_pthread_mutex_lock(pthread_mutex_t *mutex)
{
if (mutex)
pthread_mutex_lock(mutex);
}
static void unix_pthread_mutex_unlock(pthread_mutex_t *mutex)
{
if (mutex)
pthread_mutex_unlock(mutex);
}
#else
#define unix_pthread_mutex_lock(mutex) do {} while (0)
#define unix_pthread_mutex_unlock(mutex) do {} while (0)
#endif


> @@ -365,63 +413,229 @@ static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, memset(inode_bitmap, 0, inode_nbytes);
> cnt = inode_nbytes << 3;
> + if (mutex)
> + pthread_mutex_lock(mutex);
> retval = ext2fs_set_inode_bitmap_range2(fs->inode_map,
> ino_itr, cnt, inode_bitmap);
> + if (mutex)
> + pthread_mutex_unlock(mutex);

(minor) These two pthread calls need #ifdef HAVE_PTHREAD or use wrappers.

> +errcode_t ext2fs_rw_bitmaps(ext2_filsys fs, int flags, int num_threads)
> +{
> +#ifdef HAVE_PTHREAD
> + if (((fs->io->flags & CHANNEL_FLAGS_THREADS) == 0) ||
> + (num_threads == 1) || (fs->flags & EXT2_FLAG_IMAGE_FILE))
> + goto fallback;
> +
> + if (num_threads < 0) {
> +#if defined(HAVE_SYSCONF) && defined(_SC_NPROCESSORS_CONF)
> + num_threads = sysconf(_SC_NPROCESSORS_CONF);
> +#else
> + /*
> + * Guess for now; eventually we should probably define
> + * ext2fs_get_num_cpus() and teach it how to get this info on
> + * MacOS, FreeBSD, etc.
> + * ref: https://stackoverflow.com/questions/150355
> + */
> + num_threads = 4;
> +#endif

(style) this #endif wouldn't hurt to have a /* HAVE_SYSCONF */ comment,
but currently isn't too far from the #ifdef though it looks like it may
become further away and harder to track in the future.

> + if (num_threads <= 1)
> + goto fallback;
> + }

[snip]

> + /* XXX should save and restore cache setting */
> + io_channel_set_options(fs->io, "cache=on");
> return retval;
> +fallback:
> +#endif

(style) this would definitely benefit from a /* HAVE_PTHREAD */ comment,
since it is so far from the original #ifdef.

Cheers, Andreas






Attachments:
signature.asc (890.00 B)
Message signed with OpenPGP

2020-12-13 05:20:45

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH RFC 4/5] ext2fs: parallel bitmap loading

On Thu, Dec 10, 2020 at 05:12:09PM -0700, Andreas Dilger wrote:
> > @@ -329,12 +369,20 @@ static errcode_t read_bitmaps(ext2_filsys fs, int do_inode, int do_block)
> > }
> > if (!bitmap_tail_verify((unsigned char *) block_bitmap,
> > block_nbytes, fs->blocksize - 1))
> > - tail_flags |= EXT2_FLAG_BBITMAP_TAIL_PROBLEM;
> > + *tail_flags |= EXT2_FLAG_BBITMAP_TAIL_PROBLEM;
> > } else
> > memset(block_bitmap, 0, block_nbytes);
> > cnt = block_nbytes << 3;
> > +#ifdef HAVE_PTHREAD
> > + if (mutex)
> > + pthread_mutex_lock(mutex);
> > +#endif
> > retval = ext2fs_set_block_bitmap_range2(fs->block_map,
> > blk_itr, cnt, block_bitmap);
> > +#ifdef HAVE_PTHREAD
> > + if (mutex)
> > + pthread_mutex_unlock(mutex);
> > +#endif
>
> (style) It wouldn't be terrible to have wrappers around these functions
> instead of inline #ifdef in the few places they are used, like:
>
> #ifdef HAVE_PTHREAD
> static void unix_pthread_mutex_lock(pthread_mutex_t *mutex)
> {
> if (mutex)
> pthread_mutex_lock(mutex);
> }
> static void unix_pthread_mutex_unlock(pthread_mutex_t *mutex)
> {
> if (mutex)
> pthread_mutex_unlock(mutex);
> }
> #else
> #define unix_pthread_mutex_lock(mutex) do {} while (0)
> #define unix_pthread_mutex_unlock(mutex) do {} while (0)
> #endif

We'd also need to have a typedef for mutex_t which is either
pthreads_mutex_t if pthreads are available, or an int (or some other
placeholder type) if it isn't.

I had tried to make sure that rw_bitmaps.c will correctly compile with
HAVE_PTHREAD and HAVE_PTHREAD_H are undefined. It looks like I didn't
quite get it completely working since there's at leasts one function
signature where we have an unprotected use of pthread_mutex_t, so
that's something we should check before finalizing the patch --- in
addition to the unprotected use of pthread_mutex_{lock,unlock} that
you pointed out.

- Ted