Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932114AbbKYI5S (ORCPT ); Wed, 25 Nov 2015 03:57:18 -0500 Received: from szxga03-in.huawei.com ([119.145.14.66]:48550 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753867AbbKYI5R (ORCPT ); Wed, 25 Nov 2015 03:57:17 -0500 Message-ID: <565574B9.5090109@huawei.com> Date: Wed, 25 Nov 2015 16:43:37 +0800 From: "Wangnan (F)" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Adrian Hunter CC: Arnaldo Carvalho de Melo , David Ahern , Yunlong Song , , , , , , , , , , , , , , Subject: Re: [PATCH] perf record: Add snapshot mode support for perf's regular events References: <1448373632-8806-1-git-send-email-yunlong.song@huawei.com> <1448373632-8806-2-git-send-email-yunlong.song@huawei.com> <56547D01.8020606@gmail.com> <20151124152023.GE18140@kernel.org> <56553022.8000101@huawei.com> <565561B6.2000005@intel.com> <565567A4.5020909@huawei.com> <565570F5.8070001@intel.com> In-Reply-To: <565570F5.8070001@intel.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.111.66.109] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020206.565577B6.007C,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 76b47648a109b588cd5f3228216a19b1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4770 Lines: 106 On 2015/11/25 16:27, Adrian Hunter wrote: > On 25/11/15 09:47, Wangnan (F) wrote: >> >> On 2015/11/25 15:22, Adrian Hunter wrote: >>> On 25/11/15 05:50, Wangnan (F) wrote: >>>> On 2015/11/24 23:20, Arnaldo Carvalho de Melo wrote: >>>>> Em Tue, Nov 24, 2015 at 08:06:41AM -0700, David Ahern escreveu: >>>>>> On 11/24/15 7:00 AM, Yunlong Song wrote: >>>>>>> +static int record__write(struct record *rec, void *bf, size_t size) >>>>>>> +{ >>>>>>> + if (rec->memory.size && memory_enabled) { >>>>>>> + if (perf_memory__write(&rec->memory, bf, size) < 0) { >>>>>>> + pr_err("failed to write memory data, error: %m\n"); >>>>>>> + return -1; >>>>>>> + } >>>>>>> + } else { >>>>>>> + if (perf_data_file__write(rec->session->file, bf, size) < 0) { >>>>>>> + pr_err("failed to write perf data, error: %m\n"); >>>>>>> + return -1; >>>>>>> + } >>>>>>> + rec->bytes_written += size; >>>>>>> } >>>>>>> >>>>>>> - rec->bytes_written += size; >>>>>>> return 0; >>>>>>> } >>>>>>> >>>>>>> @@ -86,6 +214,8 @@ static int record__mmap_read(struct record *rec, int >>>>>>> idx) >>>>>>> if (old == head) >>>>>>> return 0; >>>>>>> >>>>>>> + memory_enabled = 1; >>>>>>> + >>>>>>> rec->samples++; >>>>>>> >>>>>>> size = head - old; >>>>>>> @@ -113,6 +243,7 @@ static int record__mmap_read(struct record *rec, int >>>>>>> idx) >>>>>>> md->prev = old; >>>>>>> perf_evlist__mmap_consume(rec->evlist, idx); >>>>>>> out: >>>>>>> + memory_enabled = 0; >>>>>>> return rc; >>>>>>> } >>>>>>> >>>>>> So you are basically ignoring all samples until SIGUSR2 is received. That >>>>> No, he is not, its just that his code is difficult to follow, has to be >>>>> rewritten, but he is ignoring just PERF_RECORD_SAMPLE events, so it >>>>> will.. >>>>> >>>>>> means the resulting data file will have limited history of task events for >>>>> ... have a complete history of task events, since PERF_RECORD_FORK, etc >>>>> are not being ignored. >>>>> >>>>> No? >>>> Actually we are discussing about this problem. >>>> >>>> For such tracking events (PERF_RECORD_FORK...), we have dummy event so >>>> it is possible for us to receive tracking events from a separated >>>> channel, therefore we don't have to parse every events to pick those >>>> events out. Instead, we can process tracking events differently, then >>>> more interesting things can be done. For example, squashing those tracking >>>> events if it takes too much memory... >>>> >>>> Furthermore, there's another problem being discussed: if userspace >>>> ringbuffer >>>> is bytes based, parsing event is unavoidable. Without parsing event we are >>>> unable to find the new 'head' pointer when overwriting. >>> Have you considered trying to find the head by trial-and-error at the time >>> you make the snapshot i.e. look at the first 8 bytes (event records are 8 >>> byte aligned) and see if it is a valid record header, if not try the next 8 >>> bytes. When you find a real event record it should parse without error and >>> the subsequent events should all parse without error too, all the way to the >>> tail. Then you can use timestamps and compare the events byte-by-byte to >>> avoid overlaps between 2 snapshots. >> It seems not work. Now we have BPF output event, it is possible that a >> BPF program output anything through that event. Even if we have a magic >> in head of each event, we can't prevent BPF output event output that >> magic, except we introduce some 'escape' method to prevent BPF output >> event output some data pattern. So although might work in reallife, >> this solution is logically incorrect. Or am I miss someting? > When you find the head, all the events will parse correctly. It seems to me > highly unlikely that would happen if you guessed the head wrongly. > It is only incorrect if it gives the wrong result. Right, so I said it might work in reallife. However, I think we should better to try to provide some logically correct solution. Also, 'guessing' means some sort of intelligence, or how do we deal with guessing error? Simply drop them? And what's your opinion on the bucket besed ring buffer? With that design we only need to maintain a ringbuffer of pointers. It should be much simpler. The only drawback I can image is the waste of memory because we have to alloc buckets pessimistically. Do you think that method have other problem I haven't considered? Thank you. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/