Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752864AbcJKJ6a convert rfc822-to-8bit (ORCPT ); Tue, 11 Oct 2016 05:58:30 -0400 Received: from smtprelay0043.hostedemail.com ([216.40.44.43]:42643 "EHLO smtprelay.hostedemail.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752735AbcJKJ62 (ORCPT ); Tue, 11 Oct 2016 05:58:28 -0400 X-Session-Marker: 726F737465647440676F6F646D69732E6F7267 X-Spam-Summary: 50,0,0,,d41d8cd98f00b204,rostedt@goodmis.org,:::,RULES_HIT:41:69:355:379:541:599:800:960:966:967:968:973:988:989:1260:1277:1311:1313:1314:1345:1359:1437:1513:1515:1516:1518:1521:1534:1543:1593:1594:1605:1711:1730:1747:1777:1792:1981:2194:2196:2198:2199:2200:2201:2393:2525:2553:2560:2563:2682:2685:2693:2859:2907:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3622:3865:3866:3867:3868:3870:3871:3872:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4250:4385:4605:5007:6119:6120:6261:7875:7903:9025:10004:10400:10848:10946:10967:11026:11232:11473:11658:11914:12043:12291:12663:12683:12740:13439:14096:14097:14181:14659:14721:14819:21080:21451:30034:30051:30054:30070:30090:30091,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:3,LUA_SUMMARY:none X-HE-Tag: coil30_7fa74bab2d521 X-Filterd-Recvd-Size: 4193 Date: Tue, 11 Oct 2016 05:57:53 -0400 From: Steven Rostedt To: Joel Fernandes Cc: linux-kernel@vger.kernel.org Subject: Re: [RFC 0/7] pstore: Improve performance of ftrace backend with ramoops Message-ID: <20161011055753.2690178f@grimm.local.home> In-Reply-To: <1475904515-24970-1-git-send-email-joelaf@google.com> References: <1475904515-24970-1-git-send-email-joelaf@google.com> X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3243 Lines: 71 On Fri, 7 Oct 2016 22:28:27 -0700 Joel Fernandes wrote: > Here's an early RFC for a patch series on improving ftrace throughput with > ramoops. I am hoping to get some early comments so I'm releasing it in advance. > It is functional and tested. > > Currently ramoops uses a single zone to store function traces. To make this > work, it has to uses locking to synchronize accesses to the buffers. Recently > the synchronization was completely moved from a cmpxchg mechanism to raw > spinlocks due to difficulties in using cmpxchg on uncached memory and also on > RAMs behind PCIe. [1] This change further dropped the peformance of ramoops > pstore backend by more than half in my tests. > > This patch series improves the situation dramatically by around 280% from what > it is now by creating a ramoops persistent zone for each CPU and avoiding use of > locking altogether for ftrace. At init time, the persistent zones are then > merged together. > > Here are some tests to show the improvements. Tested using a qemu quad core > x86_64 instance with -mem-path to persist the guest RAM to a file. I measured > avergage throughput of dd over 30 seconds: > > dd if=/dev/zero | pv | dd of=/dev/null > > Without this patch series: 24MB/s > With per-cpu buffers and counter increment: 91.5 MB/s (improvement by ~ 281%) > with per-cpu buffers and trace_clock: 51.9 MB/s > > Some more considerations: > 1. Inorder to do the merge of the individual buffers, I am using racy counters > since I didn't want to sacrifice throughput for perfect time stamps. > trace_clock() for timestamps although did the job but was almost half the > throughput of using counter based timestamp. > > 2. Since the patches divide the available ftrace persistent space by the number > of CPUs, lesser space will now be available per-CPU however the user is free to > disable per CPU behavior and revert to the old behavior by specifying > PSTORE_PER_CPU flag. Its a space vs performance trade-off so if user has > enough space and not a lot of CPUs, then using per-CPU persistent buffers make > sense for better performance. > > 3. Without using any counters or timestamps, the improvement is even more > (~140MB/s) but the buffers cannot be merged. > > [1] https://lkml.org/lkml/2016/9/8/375 >From a tracing point of view, I have no qualms with this patch set. -- Steve > > Joel Fernandes (7): > pstore: Make spinlock per zone instead of global > pstore: locking: dont lock unless caller asks to > pstore: Remove case of PSTORE_TYPE_PMSG write using deprecated > function > pstore: Make ramoops_init_przs generic for other prz arrays > ramoops: Split ftrace buffer space into per-CPU zones > pstore: Add support to store timestamp counter in ftrace records > pstore: Merge per-CPU ftrace zones into one zone for output > > fs/pstore/ftrace.c | 3 + > fs/pstore/inode.c | 7 +- > fs/pstore/internal.h | 34 ------- > fs/pstore/ram.c | 234 +++++++++++++++++++++++++++++++++++---------- > fs/pstore/ram_core.c | 30 +++--- > include/linux/pstore.h | 69 +++++++++++++ > include/linux/pstore_ram.h | 6 +- > 7 files changed, 280 insertions(+), 103 deletions(-) >