Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755296AbbKYJJD (ORCPT ); Wed, 25 Nov 2015 04:09:03 -0500 Received: from mga02.intel.com ([134.134.136.20]:17818 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754364AbbKYJIy (ORCPT ); Wed, 25 Nov 2015 04:08:54 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,342,1444719600"; d="scan'208";a="606819738" Subject: Re: [PATCH] perf record: Add snapshot mode support for perf's regular events To: "Wangnan (F)" References: <1448373632-8806-1-git-send-email-yunlong.song@huawei.com> <1448373632-8806-2-git-send-email-yunlong.song@huawei.com> <56547D01.8020606@gmail.com> <20151124152023.GE18140@kernel.org> <56553022.8000101@huawei.com> <565561B6.2000005@intel.com> <565567A4.5020909@huawei.com> <565570F5.8070001@intel.com> <565574B9.5090109@huawei.com> Cc: Arnaldo Carvalho de Melo , David Ahern , Yunlong Song , a.p.zijlstra@chello.nl, paulus@samba.org, mingo@redhat.com, linux-kernel@vger.kernel.org, namhyung@kernel.org, ast@kernel.org, masami.hiramatsu.pt@hitachi.com, kan.liang@intel.com, jolsa@kernel.org, bp@alien8.de, jean.pihet@linaro.org, rric@kernel.org, xiakaixu@huawei.com, hekuang@huawei.com From: Adrian Hunter Organization: Intel Finland Oy, Registered Address: PL 281, 00181 Helsinki, Business Identity Code: 0357606 - 4, Domiciled in Helsinki Message-ID: <565579D4.2030108@intel.com> Date: Wed, 25 Nov 2015 11:05:24 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <565574B9.5090109@huawei.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5323 Lines: 119 On 25/11/15 10:43, Wangnan (F) wrote: > > > On 2015/11/25 16:27, Adrian Hunter wrote: >> On 25/11/15 09:47, Wangnan (F) wrote: >>> >>> On 2015/11/25 15:22, Adrian Hunter wrote: >>>> On 25/11/15 05:50, Wangnan (F) wrote: >>>>> On 2015/11/24 23:20, Arnaldo Carvalho de Melo wrote: >>>>>> Em Tue, Nov 24, 2015 at 08:06:41AM -0700, David Ahern escreveu: >>>>>>> On 11/24/15 7:00 AM, Yunlong Song wrote: >>>>>>>> +static int record__write(struct record *rec, void *bf, size_t size) >>>>>>>> +{ >>>>>>>> + if (rec->memory.size && memory_enabled) { >>>>>>>> + if (perf_memory__write(&rec->memory, bf, size) < 0) { >>>>>>>> + pr_err("failed to write memory data, error: %m\n"); >>>>>>>> + return -1; >>>>>>>> + } >>>>>>>> + } else { >>>>>>>> + if (perf_data_file__write(rec->session->file, bf, size) < 0) { >>>>>>>> + pr_err("failed to write perf data, error: %m\n"); >>>>>>>> + return -1; >>>>>>>> + } >>>>>>>> + rec->bytes_written += size; >>>>>>>> } >>>>>>>> >>>>>>>> - rec->bytes_written += size; >>>>>>>> return 0; >>>>>>>> } >>>>>>>> >>>>>>>> @@ -86,6 +214,8 @@ static int record__mmap_read(struct record *rec, int >>>>>>>> idx) >>>>>>>> if (old == head) >>>>>>>> return 0; >>>>>>>> >>>>>>>> + memory_enabled = 1; >>>>>>>> + >>>>>>>> rec->samples++; >>>>>>>> >>>>>>>> size = head - old; >>>>>>>> @@ -113,6 +243,7 @@ static int record__mmap_read(struct record *rec, >>>>>>>> int >>>>>>>> idx) >>>>>>>> md->prev = old; >>>>>>>> perf_evlist__mmap_consume(rec->evlist, idx); >>>>>>>> out: >>>>>>>> + memory_enabled = 0; >>>>>>>> return rc; >>>>>>>> } >>>>>>>> >>>>>>> So you are basically ignoring all samples until SIGUSR2 is received. >>>>>>> That >>>>>> No, he is not, its just that his code is difficult to follow, has to be >>>>>> rewritten, but he is ignoring just PERF_RECORD_SAMPLE events, so it >>>>>> will.. >>>>>> >>>>>>> means the resulting data file will have limited history of task >>>>>>> events for >>>>>> ... have a complete history of task events, since PERF_RECORD_FORK, etc >>>>>> are not being ignored. >>>>>> >>>>>> No? >>>>> Actually we are discussing about this problem. >>>>> >>>>> For such tracking events (PERF_RECORD_FORK...), we have dummy event so >>>>> it is possible for us to receive tracking events from a separated >>>>> channel, therefore we don't have to parse every events to pick those >>>>> events out. Instead, we can process tracking events differently, then >>>>> more interesting things can be done. For example, squashing those tracking >>>>> events if it takes too much memory... >>>>> >>>>> Furthermore, there's another problem being discussed: if userspace >>>>> ringbuffer >>>>> is bytes based, parsing event is unavoidable. Without parsing event we are >>>>> unable to find the new 'head' pointer when overwriting. >>>> Have you considered trying to find the head by trial-and-error at the time >>>> you make the snapshot i.e. look at the first 8 bytes (event records are 8 >>>> byte aligned) and see if it is a valid record header, if not try the next 8 >>>> bytes. When you find a real event record it should parse without error and >>>> the subsequent events should all parse without error too, all the way to >>>> the >>>> tail. Then you can use timestamps and compare the events byte-by-byte to >>>> avoid overlaps between 2 snapshots. >>> It seems not work. Now we have BPF output event, it is possible that a >>> BPF program output anything through that event. Even if we have a magic >>> in head of each event, we can't prevent BPF output event output that >>> magic, except we introduce some 'escape' method to prevent BPF output >>> event output some data pattern. So although might work in reallife, >>> this solution is logically incorrect. Or am I miss someting? >> When you find the head, all the events will parse correctly. It seems to me >> highly unlikely that would happen if you guessed the head wrongly. >> It is only incorrect if it gives the wrong result. > > Right, so I said it might work in reallife. However, I think we > should better to try to provide some logically correct solution. > Also, 'guessing' means some sort of intelligence, or how do we > deal with guessing error? Simply drop them? It is not "intelligence" it is a linear search. If it gives more than one answer, it is a fatal error. You can mitigate that by adding more validation of the event records. But it is only a suggestion. > And what's your opinion on the bucket besed ring buffer? With that > design we only need to maintain a ringbuffer of pointers. It should > be much simpler. The only drawback I can image is the waste of memory > because we have to alloc buckets pessimistically. Do you think > that method have other problem I haven't considered? The drawback is that you have to copy all the events all the time instead of letting the kernel ring buffer wraparound without any userspace involvement until you make a snapshot. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/