From: Josef Bacik Subject: Re: transaction batching performance & multi-threaded synchronous writers Date: Tue, 15 Jul 2008 16:10:10 -0400 Message-ID: <20080715201010.GD30311@unused.rdu.redhat.com> References: <487B7B9B.3020001@gmail.com> <20080715183904.GC30311@unused.rdu.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Ric Wheeler , linux-ext4@vger.kernel.org, adilger@sun.com To: Josef Bacik Return-path: Received: from mx1.redhat.com ([66.187.233.31]:49644 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750967AbYGOU3H (ORCPT ); Tue, 15 Jul 2008 16:29:07 -0400 Content-Disposition: inline In-Reply-To: <20080715183904.GC30311@unused.rdu.redhat.com> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Tue, Jul 15, 2008 at 02:39:04PM -0400, Josef Bacik wrote: > On Mon, Jul 14, 2008 at 12:15:23PM -0400, Ric Wheeler wrote: > > > > Here is a pointer to the older patch & some results: > > > > http://www.spinics.net/lists/linux-fsdevel/msg13121.html > > > > I will retry this on some updated kernels, but would not expect to see a > > difference since the code has not been changed ;-) > > > > Ok here are the numbers with the original idea I had proposed. > > type threads base patch speedup > sata 1 17.9 17.3 0.97 > sata 2 33.2 34.2 1.03 > sata 4 58.4 63.6 1.09 > sata 8 78.8 80.8 1.03 > sata 16 94.4 97.6 1.16 > > ram 1 2394.4 1878.0 0.78 > ram 2 989.6 2041.1 2.06 > ram 4 1466.1 3201.8 2.18 > ram 8 1858.1 3362.8 1.81 > ram 16 3008.0 3227.7 1.07 > > I've got to find a fast disk array to test this with, but the ramdisk results > make me happy, though they were kind of irratic, so I think the fast disk array > will be a more stable measure of how well this patch does, but it definitely > doesn't hurt the slow case, and brings stability to the fast case. Thanks much, > Hmm talking with ric I should just leave the single thread stuff alone. That removes the slight speed regression seen above. Thanks, Josef Index: linux-2.6/fs/jbd/commit.c =================================================================== --- linux-2.6.orig/fs/jbd/commit.c +++ linux-2.6/fs/jbd/commit.c @@ -273,6 +273,15 @@ write_out_data: journal_do_submit_data(wbuf, bufs); } +static inline unsigned long elapsed_jiffies(unsigned long start, + unsigned long end) +{ + if (end >= start) + return end - start; + + return end + (MAX_JIFFY_OFFSET - start) + 1; +} + /* * journal_commit_transaction * @@ -288,6 +297,7 @@ void journal_commit_transaction(journal_ int flags; int err; unsigned long blocknr; + unsigned long long commit_time, start_time; char *tagp = NULL; journal_header_t *header; journal_block_tag_t *tag = NULL; @@ -400,6 +410,7 @@ void journal_commit_transaction(journal_ commit_transaction->t_state = T_FLUSH; journal->j_committing_transaction = commit_transaction; journal->j_running_transaction = NULL; + start_time = jiffies; commit_transaction->t_log_start = journal->j_head; wake_up(&journal->j_wait_transaction_locked); spin_unlock(&journal->j_state_lock); @@ -873,6 +884,12 @@ restart_loop: J_ASSERT(commit_transaction == journal->j_committing_transaction); journal->j_commit_sequence = commit_transaction->t_tid; journal->j_committing_transaction = NULL; + commit_time = elapsed_jiffies(start_time, jiffies); + if (unlikely(!journal->j_average_commit_time)) + journal->j_average_commit_time = commit_time; + else + journal->j_average_commit_time = (commit_time + + journal->j_average_commit_time) / 2; spin_unlock(&journal->j_state_lock); if (commit_transaction->t_checkpoint_list == NULL && Index: linux-2.6/fs/jbd/transaction.c =================================================================== --- linux-2.6.orig/fs/jbd/transaction.c +++ linux-2.6/fs/jbd/transaction.c @@ -49,6 +49,7 @@ get_transaction(journal_t *journal, tran { transaction->t_journal = journal; transaction->t_state = T_RUNNING; + transaction->t_start_time = jiffies; transaction->t_tid = journal->j_transaction_sequence++; transaction->t_expires = jiffies + journal->j_commit_interval; spin_lock_init(&transaction->t_handle_lock); @@ -1361,7 +1362,7 @@ int journal_stop(handle_t *handle) { transaction_t *transaction = handle->h_transaction; journal_t *journal = transaction->t_journal; - int old_handle_count, err; + int err; pid_t pid; J_ASSERT(journal_current_handle() == handle); @@ -1397,11 +1398,17 @@ int journal_stop(handle_t *handle) */ pid = current->pid; if (handle->h_sync && journal->j_last_sync_writer != pid) { + unsigned long commit_time; + journal->j_last_sync_writer = pid; - do { - old_handle_count = transaction->t_handle_count; + + spin_lock(&journal->j_state_lock); + commit_time = journal->j_average_commit_time; + spin_unlock(&journal->j_state_lock); + + while (time_before(jiffies, commit_time + + transaction->t_start_time)) schedule_timeout_uninterruptible(1); - } while (old_handle_count != transaction->t_handle_count); } current->journal_info = NULL; Index: linux-2.6/include/linux/jbd.h =================================================================== --- linux-2.6.orig/include/linux/jbd.h +++ linux-2.6/include/linux/jbd.h @@ -543,6 +543,11 @@ struct transaction_s unsigned long t_expires; /* + * When this transaction started, in jiffies [no locking] + */ + unsigned long t_start_time; + + /* * How many handles used this transaction? [t_handle_lock] */ int t_handle_count; @@ -800,6 +805,8 @@ struct journal_s pid_t j_last_sync_writer; + unsigned long long j_average_commit_time; + /* * An opaque pointer to fs-private information. ext3 puts its * superblock pointer here