Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S938194AbdD0OC5 (ORCPT ); Thu, 27 Apr 2017 10:02:57 -0400 Received: from mail-pf0-f173.google.com ([209.85.192.173]:36089 "EHLO mail-pf0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S937121AbdD0OCt (ORCPT ); Thu, 27 Apr 2017 10:02:49 -0400 Subject: Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory To: Steven Rostedt References: <2e2d52edc0488adfefa6d19e27b8b9beae95a7b6.1488380367.git.panand@redhat.com> <20170426100135.758cb098@gandalf.local.home> Cc: linux-kernel@vger.kernel.org, dyoung@redhat.com, xlpang@redhat.com From: Pratyush Anand Message-ID: <727920fa-219c-6d8d-3ba6-2f0553b2cbdc@redhat.com> Date: Thu, 27 Apr 2017 19:32:43 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 In-Reply-To: <20170426100135.758cb098@gandalf.local.home> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6226 Lines: 248 Hi Steve, On Wednesday 26 April 2017 07:31 PM, Steven Rostedt wrote: > > Sorry for the late reply. I finally have time to start looking at > trace-cmd again. Thanks a lot for your review :-) > > On Wed, 1 Mar 2017 20:32:37 +0530 > Pratyush Anand wrote: > [...] > This is as nice idea, but it needs some clean ups. I tried it out just > to play with it. > > First, I just did: > > trace-cmd top > > And nothing happened. Either a help message needs to be printed, or it > starts something. Maybe have it be interactive, that is, print out data > as it comes in? > [...] > > But then for some reason it -p stopped doing anything. And -e didn't > stop it either. But after killing it manually and removing the pid > file, it appeared to work normal again. I can't seem to repeat it, so > lets not worry about that now. > Yes, this version was sharing info between two processes using files. I have a new version [1] which has sharing through socket and that works better. Help issue is also fixed in that version. > More below. > [...] >> +void top_disable_events(void) >> +{ >> + char *path; >> + char c; >> + int fd; >> + int ret; >> + >> + /* >> + * WARNING: We want only few events related to memory allocation to >> + * be enabled. Disable all events here. Latter, selective events >> + * would be enabled > > trace-cmd reset does this. You may want to call that instead of doing > it manually here. Someone else wanted to pull out trace-cmd reset into > its own file. I'll have to get those patches. But then we can call that > and enable tracing again. > > Or better yet, if we are only tracing memory events, just create your > own instance and use that. Why bother with normal tracing? > > mkdir /sys/kernel/debug/tracing/instances/trace-cmd-memory Yes, it seems a good idea. Will do. > >> + */ >> + c = '0'; >> + path = get_instance_file(&top_instance, "events/enable"); >> + fd = open(path, O_WRONLY); >> + if (fd < 0) >> + die("failed to open '%s'", path); >> + ret = write(fd, &c, 1); >> + close(fd); >> + tracecmd_put_tracing_file(path); >> + if (ret < 1) >> + die("failed to write 0 to events/enable"); >> + >> + path = get_instance_file(&top_instance, "tracing_on"); >> + fd = open(path, O_WRONLY); >> + if (fd < 0) >> + die("failed to open '%s'", path); >> + ret = write(fd, &c, 1); >> + close(fd); >> + tracecmd_put_tracing_file(path); >> + if (ret < 1) >> + die("failed to write 0 to tracing_on\n"); >> + >> + path = get_instance_file(&top_instance, "set_event"); >> + fd = open(path, O_WRONLY); >> + if (fd < 0) >> + die("failed to open '%s'", path); >> + close(fd); >> + tracecmd_put_tracing_file(path); >> +} >> + >> +void top_exit(void) >> +{ >> + int i; >> + /* >> + * free all the dynamic memories which could have been allocated by >> + * this program >> + */ >> + if (kbuf) >> + kbuffer_free(kbuf); >> + if (record) >> + free_record(record); >> + if (pevent) >> + pevent_free(pevent); > > The above three functions all can handle a NULL pointer. Remove the if > statements. ok > >> + if (top_task_stats) { >> + for (i = 0; i < top_task_cnt; i++) >> + free(top_task_stats[i].comm); >> + free(top_task_stats); >> + } >> + >> + top_disable_events(); >> + remove(TRACE_CMD_TOP_PID_FILE); >> +} >> + >> +void top_initialize_events(void) >> +{ >> + char *path; >> + int fd; >> + char c; >> + int ret; >> + char *buf; >> + int size; >> + enum kbuffer_long_size long_size; >> + enum kbuffer_endian endian; >> + char id_str[8]; >> + > > Might want to look at trace_stream_init() and use that instead. It is > made for live streaming of events. OK, will look into trace_stream_init(). > >> + /* trace data is read from the trace_raw_pipe directly */ >> + if (tracecmd_host_bigendian()) >> + endian = KBUFFER_ENDIAN_BIG; >> + else >> + endian = KBUFFER_ENDIAN_LITTLE; >> + [...] >> +void top_start(void) >> +{ >> + pid_t pid; >> + int fd, ret; >> + >> + top_disable_events(); >> + top_initialize_events(); >> + page_size = getpagesize(); >> + >> + pid = fork(); >> + if (!pid) { >> + do { >> + top_process_events(); >> + sleep(1); > > We should probably just sleep on the buffer, waiting for data. OK > >> + if (set_print_only) { >> + top_print_stats(); >> + set_print_only = false; >> + } >> + if (set_print_and_exit) { >> + top_print_stats(); >> + top_exit(); >> + break; >> + } >> + } while (1); >> + } else { >> + fd = open(TRACE_CMD_TOP_PID_FILE, O_WRONLY | O_CREAT); >> + if (fd < 0) >> + goto kill_child; >> + ret = write(fd, &pid, sizeof(pid)); >> + if (ret < sizeof(pid)) >> + goto close_fd; >> + close(fd); >> + } >> + >> + return; >> + >> +close_fd: >> + close(fd); >> +kill_child: >> + kill(pid, SIGUSR2); >> + die("Failed to start trace-top"); >> + return; >> +} >> + [...] > Looks good, I think we can extend on this. Thanks :-) I will implement your review comments and will send next revision. However, I had couple of observation which I was unable to justify: # ./trace-cmd top -s /tmp/test # ./trace-cmd top -p /tmp/test | grep trace-cmd 15292 trace-cmd 22144 15808 Here,15292 is the pid of trace-cmd task 22144 KB is the peak memory usage 15808 KB is the current memory usage Now check rss component from statm # cat /proc/15292/statm 50 35 23 7 0 12 0 36 This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35 pages = 35*64 = 2240KB I patched my kernel [2] for test purpose, so that statm reports peak memory as well. Here, the last extra entry in statm output is peak memory and it is 36 pages = 2304KB. So, this is a huge difference between what has been reported by statm and what we get from trace-cmd. I understand that `trace-cmd top` output would only be approximate, because many of the memory could be allocated by task and freed in interrupt context. So, many a time it can differ. But, is such a huge difference justified? If yes, can we count on the output of this utility to find early boot time oom issues? [1] https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4 [2] https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517