2010-04-15 14:55:33

by Michal Simek

[permalink] [raw]
Subject: sched_clock - microblaze

Hi Thomas and Steven,

I would like to improve time measuring for ftrace (Log below)

I looked at http://lkml.org/lkml/2009/11/16/181
where Thomas suggested to create sched_clock() function.
I used the same solution as Wu proposed but it is not nice.

Is unimplemented sched_clock the reason why ftrace not show fine grain time?

Or is there any other thing which is wrong?

Thanks,
Michal



Dumping ftrace buffer:
---------------------------------
0) ! 10000.00 us | }
0) <========== |
0) ! 10000.00 us | }
0) ! 10000.00 us | }
0) 0.000 us | get_seconds();
0) 0.000 us | tcp_rcv_rtt_update();
0) 0.000 us | tcp_ack();
0) 0.000 us | tcp_rcv_rtt_update();
0) 0.000 us | tcp_urg();
0) | tcp_data_queue() {
0) 0.000 us | __sk_mem_schedule();
0) 0.000 us | tcp_event_data_recv();
0) 0.000 us | sock_def_readable();
0) 0.000 us | }
0) 0.000 us | tcp_check_space();
0) | __tcp_ack_snd_check() {


--
Michal Simek, Ing. (M.Eng)
PetaLogix - Linux Solutions for a Reconfigurable World
w: http://www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663


2010-04-15 15:05:55

by Steven Rostedt

[permalink] [raw]
Subject: Re: sched_clock - microblaze

On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote:
> Hi Thomas and Steven,
>
> I would like to improve time measuring for ftrace (Log below)
>
> I looked at http://lkml.org/lkml/2009/11/16/181
> where Thomas suggested to create sched_clock() function.
> I used the same solution as Wu proposed but it is not nice.
>
> Is unimplemented sched_clock the reason why ftrace not show fine grain time?

Yeah, sched_clock is used by ftrace for timings, so if it only returns
jiffies, then that will, unfortunately, be the resolution of the tracer.

I've been told that if you make a higher resolution timer for sched
clock, it will improve the scheduling in CFS.

-- Steve

>
> Or is there any other thing which is wrong?
>
> Thanks,
> Michal
>
>
>
> Dumping ftrace buffer:
> ---------------------------------
> 0) ! 10000.00 us | }
> 0) <========== |
> 0) ! 10000.00 us | }
> 0) ! 10000.00 us | }
> 0) 0.000 us | get_seconds();
> 0) 0.000 us | tcp_rcv_rtt_update();
> 0) 0.000 us | tcp_ack();
> 0) 0.000 us | tcp_rcv_rtt_update();
> 0) 0.000 us | tcp_urg();
> 0) | tcp_data_queue() {
> 0) 0.000 us | __sk_mem_schedule();
> 0) 0.000 us | tcp_event_data_recv();
> 0) 0.000 us | sock_def_readable();
> 0) 0.000 us | }
> 0) 0.000 us | tcp_check_space();
> 0) | __tcp_ack_snd_check() {
>
>

2010-04-15 15:06:16

by Steven J. Magnani

[permalink] [raw]
Subject: Re: sched_clock - microblaze

Michal,

On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote:

> Is unimplemented sched_clock the reason why ftrace not show fine grain time?
>
> Or is there any other thing which is wrong?
>

I think that's it. On my platform we have a free-running 1 MHz counter
so I implemented a platform sched_clock for that, and I get nice ftrace
times.

0) + 65.000 us | finish_task_switch();
0) | lock_sock_nested() {
0) + 52.000 us | local_bh_disable();
0) + 53.000 us | local_bh_enable();
0) ! 264.000 us | }

Steve


2010-04-15 16:21:31

by Michal Simek

[permalink] [raw]
Subject: Re: sched_clock - microblaze

Steven J. Magnani wrote:
> Michal,
>
> On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote:
>
>> Is unimplemented sched_clock the reason why ftrace not show fine grain time?
>>
>> Or is there any other thing which is wrong?
>>
>
> I think that's it. On my platform we have a free-running 1 MHz counter
> so I implemented a platform sched_clock for that, and I get nice ftrace
> times.
>
> 0) + 65.000 us | finish_task_switch();
> 0) | lock_sock_nested() {
> 0) + 52.000 us | local_bh_disable();
> 0) + 53.000 us | local_bh_enable();
> 0) ! 264.000 us | }

I don't understand why I should add any "new" free running counter
because we have one free running counter which do it (clocksource timer
- timer1). Or am I missing something?

Thanks for sending your code.
Michal



--
Michal Simek, Ing. (M.Eng)
PetaLogix - Linux Solutions for a Reconfigurable World
w: http://www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663

2010-04-15 16:33:24

by Michal Simek

[permalink] [raw]
Subject: Re: sched_clock - microblaze

Steven Rostedt wrote:
> On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote:
>> Hi Thomas and Steven,
>>
>> I would like to improve time measuring for ftrace (Log below)
>>
>> I looked at http://lkml.org/lkml/2009/11/16/181
>> where Thomas suggested to create sched_clock() function.
>> I used the same solution as Wu proposed but it is not nice.
>>
>> Is unimplemented sched_clock the reason why ftrace not show fine grain time?
>
> Yeah, sched_clock is used by ftrace for timings, so if it only returns
> jiffies, then that will, unfortunately, be the resolution of the tracer.
>
> I've been told that if you make a higher resolution timer for sched
> clock, it will improve the scheduling in CFS.

We have two timers (in one IP).
timer0 - in IRQ mode - clock event
timer1 - free running up counter without IRQ - clock source

I hope that I can use timer1 in sched_clock too.

Anyway. I played with it a little bit and I also implemented microblaze
specific sched_clock function (inspired by arch/arm/mach-ixp4xx/common.c)

unsigned long long notrace sched_clock(void)
{
printk("%s\n", __func__);
cycle_t cyc = microblaze_read(NULL);
struct clocksource *cs = &clocksource_microblaze;
return clocksource_cyc2ns(cyc, cs->mult, cs->shift);
}

(And comment sched_clock in kernel/sched_clock.c because our toolchain
has the problem with weak symbols)

Recompile the kernel and then show the log_buf

<4>Ramdisk addr 0x00000003, Compiled-in FDT at 0xc0242a28
<5>Linux version 2.6.34-rc4-00053-gbc6ce8a-dirty ([email protected]) (gcc
version 4.1.2) #27 Thu Apr 15 16:38:16 CEST 2010
<6>setup_cpuinfo: initialising
<6>setup_cpuinfo: Using full CPU PVR support
<6>cache: wb_msr
<6>setup_memory: Main mem: 0x48000000-0x50000000, size 0x08000000,
klimit 0xc09c7000
<6>setup_memory: max_mapnr: 0x8000
<6>setup_memory: min_low_pfn: 0x48000
<6>setup_memory: max_low_pfn: 0x50000
<4>reserved 0 - 0x48000000-0x009c8000
<4>reserved 1 - 0x48fe0000-0x00020000
<4>reserved 2 - 0x4fffd8c8-0x00002738
<7>On node 0 totalpages: 32768
<7>free_area_init_node: node 0, pgdat c02bc70c, node_mem_map c09c8000
<7> Normal zone: 256 pages used for memmap
<7> Normal zone: 32512 pages, LIFO batch:7
<4>Built 1 zonelists in Zone order, mobility grouping on. Total pages:
32512
<5>Kernel command line: console=ttyS0,115200
<6>PID hash table entries: 512 (order: -1, 2048 bytes)
<6>Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
<6>Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
<6>Memory: 119796k/131072k available
<4>sched_clock

The problem I see is that scheck_clock is used before our timer
subsystem is initialized.
What am I missing?

Thanks,
Michal

--
Michal Simek, Ing. (M.Eng)
PetaLogix - Linux Solutions for a Reconfigurable World
w: http://www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663

2010-04-15 17:18:23

by wu zhangjin

[permalink] [raw]
Subject: Re: sched_clock - microblaze

On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote:
> Hi Thomas and Steven,
>
> I would like to improve time measuring for ftrace (Log below)
>
> I looked at http://lkml.org/lkml/2009/11/16/181
> where Thomas suggested to create sched_clock() function.
> I used the same solution as Wu proposed but it is not nice.

Which solution did you use?

>From my point of view, the basic idea of high resolution sched_clock()
looks like:

0. read the clock cycles from the hardware clock counter

cyc = read_clock_counter()

read_clock_counter() is processor specific, for example, in X86, there
is a tsc register, and in a r4k MIPS, there is a c0 counter. such
counters can provide a high resolution of time counting: the tsc
register's frequency is equal to the frequency of the processor, if the
frequency of the processor is 1G, then one cycle of the counter gives us
1ns.

1. convert the clock cycles to ns

ns = cyc * NSEC_PER_SEC / freq (freq is clock freq of the counter)
= use the scaling math. (please refer to arch/x86/kernel/tsc.c)
= (cyc * mult) >> shift (refer to clocksource_calc_mult_shift() )

so, you need to implement the arch specific cyc2ns(), now, we get:

cyc = read_clock_counter();
ns = cyc2ns(cyc);

2. handle the overflow problems

+ overflow of 32bit counter, solution: cnt32_to_63()

If the hardware clock counter is 32bit long, you need to 'avoid' its
counting overflow via cnt32_to_63() defined in
include/linux/cnt32_to_63()

Please ensure the cnt32_to_63() is called every less than half of the
overflow period(the max time the counter can record) of the hardware
clock counter, now, we get:

#ifdef CONFIG_64BIT
cyc = read_clock_counter();
#else /* CONFIG_32BIT */
cyc = cnt32_to_63(read_clock_counter());
#endif
ns = cyc2ns(cyc);

and a kernel software timer(setup_timer(), mod_timer()) is needed to
update call the cnt32_to_63() automatically.

[why cnt32_to_63()? it has less overhead than the other methods, will
talk about it later.]

'avoid' above means avoid it overflow too quickly, for example, a 32bit
clock counter with 400MHz may overflow after about 10+ seconds, which is
not enough for Ftrace since a function may be executed for more 200+
seconds, we need a bigger 'counter', that is why cnt32_to_63() is
needed, it tries to convert the 32bit counter to a virtual 63bit(the
highest bit is used to sync the virtual high 32bit with the low real
32bit in the hardware counter) long, then with the 63bit counter, it
will overflow after several hundreds of days, which is enough for
generic debugging.

+ overflow of (cyc * mult), solution: smaller shift OR 128bit arithmetic

And please ensure the shift is not too big(Seems ARM use 8 as the shift,
and X86 use 10), otherwise, the 64bit arithmetic (cyc * mult) will
overflow quickly.

As we can see, the type of the return value of sched_clock() is unsigned
long long, 64bit, so, the value of (cyc * mult) is limited to 64bit. And
because cyc is also in 64bit (63bit if using cnt32_to_63()), mult can
not be too big, otherwise, (cyc * mult) will overflow quickly, mult is
relative to shift, then shift can not be too big.

In another aspect, the shift can not be too small, otherwise, the
precision will be lost. to learn how to choose such a shift, please
refer to arch/arm/plat-orion/time.c and arch/x86/kernel/tsc.c.

By default, the mult and shift in struct clocksource is calculated by
clocksource_calc_mult_shift(), both of them are in 32bit, and in some
platforms, for a 32bit counter, the shift is calculated from 32 to 0 and
will be set as 31 for getting a good precision, but it is too big a
value for a virtual 63bit counter, so, we need to manually set it a
smaller one, the same to a real 64bit hardware counter.

In some platforms, they have used the 128bit arithmetic to handle the
overflow problem for a real 64bit hardware counter(please refer to
arch/mips/cavium-octeon/csrc-octeon.c), perhaps we also need to use it
for clocksource_cyc2ns() too(I did not check the potential overflow
problem in such processor yet, it should overflow if the shift is too
big!), but we also need to notice that, 128bit arithmetic may not be
supported directly in some processors or it may increase the overhead
obviously.

3. why not getnstimeofday()

we need a lower overhead of clock func (please refer to
kernel/trace/trace_clock.c) to make Ftrace has lower overhead for the
clock func is a basic function needed to get the timestamp of entering
into and exiting from a kernel function.

I have done some overhead benchmarking of the cnt32_to_63() based
sched_clock() and getnstimeofday() on a MIPS machine with 64bit kernel:

Clock func/overhead(ns) Min Avg Max Jitter Std.Dev.
----------------------------------------------
sched_clock(cnt32_to_63) 105 116.2 236 131 9.5
getnstimeofday() 160 167.1 437 277 15
sched_clock(Accumulation method[1]) 193 200.9 243 50 2.9
----------------------------------------------

'Overhead' above means the time spent on calling clock func themselves,
as we can see, the result of cnt32_to_63() base method is better than
the other two.

[1] the algorithm looks like this:

static inline unsigned long long notrace read_clock_counter(void)
{
static u64 clock;
static u32 old_clock;
u32 current_clock;

raw_spin_lock(&clock_lock);
current_clock = read_c0_count();
clock += ((current_clock - old_clock) & MASK);
old_clock = current_clock;
raw_spin_unlock(&clock_lock);

return clock;
}

Regards,
Wu Zhangjin

>
> Is unimplemented sched_clock the reason why ftrace not show fine grain time?
>
> Or is there any other thing which is wrong?
>
> Thanks,
> Michal
>
>
>
> Dumping ftrace buffer:
> ---------------------------------
> 0) ! 10000.00 us | }
> 0) <========== |
> 0) ! 10000.00 us | }
> 0) ! 10000.00 us | }
> 0) 0.000 us | get_seconds();
> 0) 0.000 us | tcp_rcv_rtt_update();
> 0) 0.000 us | tcp_ack();
> 0) 0.000 us | tcp_rcv_rtt_update();
> 0) 0.000 us | tcp_urg();
> 0) | tcp_data_queue() {
> 0) 0.000 us | __sk_mem_schedule();
> 0) 0.000 us | tcp_event_data_recv();
> 0) 0.000 us | sock_def_readable();
> 0) 0.000 us | }
> 0) 0.000 us | tcp_check_space();
> 0) | __tcp_ack_snd_check() {
>
>

2010-04-15 18:13:14

by wu zhangjin

[permalink] [raw]
Subject: Re: sched_clock - microblaze

On Thu, 2010-04-15 at 18:32 +0200, Michal Simek wrote:
> Steven Rostedt wrote:
> > On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote:
> >> Hi Thomas and Steven,
> >>
> >> I would like to improve time measuring for ftrace (Log below)
> >>
> >> I looked at http://lkml.org/lkml/2009/11/16/181
> >> where Thomas suggested to create sched_clock() function.
> >> I used the same solution as Wu proposed but it is not nice.
> >>
> >> Is unimplemented sched_clock the reason why ftrace not show fine grain time?
> >
> > Yeah, sched_clock is used by ftrace for timings, so if it only returns
> > jiffies, then that will, unfortunately, be the resolution of the tracer.
> >
> > I've been told that if you make a higher resolution timer for sched
> > clock, it will improve the scheduling in CFS.
>
> We have two timers (in one IP).
> timer0 - in IRQ mode - clock event
> timer1 - free running up counter without IRQ - clock source
>
> I hope that I can use timer1 in sched_clock too.
>
> Anyway. I played with it a little bit and I also implemented microblaze
> specific sched_clock function (inspired by arch/arm/mach-ixp4xx/common.c)
>
> unsigned long long notrace sched_clock(void)
> {
> printk("%s\n", __func__);
> cycle_t cyc = microblaze_read(NULL);
> struct clocksource *cs = &clocksource_microblaze;
> return clocksource_cyc2ns(cyc, cs->mult, cs->shift);
> }

Hey, I guess you got:

sched_clock() -> printk() -> sched_clock() ...

since printk() is not annotated with notrace, it will call mcount() and
then sched_clock() and ... so, you can not call printk() in
sched_clock().

but you may try this if you want to print something in sched_clock() for
debugging.

#include <>

sched_clock() {
ftrace_stop();
printk(...);
ftrace_start();
...
}

Regards,
Wu Zhangjin

>
> (And comment sched_clock in kernel/sched_clock.c because our toolchain
> has the problem with weak symbols)
>
> Recompile the kernel and then show the log_buf
>
> <4>Ramdisk addr 0x00000003, Compiled-in FDT at 0xc0242a28
> <5>Linux version 2.6.34-rc4-00053-gbc6ce8a-dirty ([email protected]) (gcc
> version 4.1.2) #27 Thu Apr 15 16:38:16 CEST 2010
> <6>setup_cpuinfo: initialising
> <6>setup_cpuinfo: Using full CPU PVR support
> <6>cache: wb_msr
> <6>setup_memory: Main mem: 0x48000000-0x50000000, size 0x08000000,
> klimit 0xc09c7000
> <6>setup_memory: max_mapnr: 0x8000
> <6>setup_memory: min_low_pfn: 0x48000
> <6>setup_memory: max_low_pfn: 0x50000
> <4>reserved 0 - 0x48000000-0x009c8000
> <4>reserved 1 - 0x48fe0000-0x00020000
> <4>reserved 2 - 0x4fffd8c8-0x00002738
> <7>On node 0 totalpages: 32768
> <7>free_area_init_node: node 0, pgdat c02bc70c, node_mem_map c09c8000
> <7> Normal zone: 256 pages used for memmap
> <7> Normal zone: 32512 pages, LIFO batch:7
> <4>Built 1 zonelists in Zone order, mobility grouping on. Total pages:
> 32512
> <5>Kernel command line: console=ttyS0,115200
> <6>PID hash table entries: 512 (order: -1, 2048 bytes)
> <6>Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
> <6>Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
> <6>Memory: 119796k/131072k available
> <4>sched_clock
>
> The problem I see is that scheck_clock is used before our timer
> subsystem is initialized.
> What am I missing?
>
> Thanks,
> Michal
>

2010-04-15 18:27:33

by Thomas Gleixner

[permalink] [raw]
Subject: Re: sched_clock - microblaze

On Thu, 15 Apr 2010, Michal Simek wrote:

> Steven J. Magnani wrote:
> > Michal,
> >
> > On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote:
> >
> > > Is unimplemented sched_clock the reason why ftrace not show fine grain
> > > time?
> > >
> > > Or is there any other thing which is wrong?
> > >
> >
> > I think that's it. On my platform we have a free-running 1 MHz counter
> > so I implemented a platform sched_clock for that, and I get nice ftrace
> > times.
> >
> > 0) + 65.000 us | finish_task_switch();
> > 0) | lock_sock_nested() {
> > 0) + 52.000 us | local_bh_disable();
> > 0) + 53.000 us | local_bh_enable();
> > 0) ! 264.000 us | }
>
> I don't understand why I should add any "new" free running counter because we
> have one free running counter which do it (clocksource timer - timer1). Or am
> I missing something?

No, you just can use the clocksource for it. We have that separation
as on x86 we want to use the TSC even if it is not usable for time
keeping as it is the fastest accesible clock in x86. So for your case
you can just use your main clocksource as sched_clock and be done.

Thanks,

tglx

2010-04-15 19:29:35

by Michal Simek

[permalink] [raw]
Subject: Re: sched_clock - microblaze

Wu Zhangjin wrote:
> On Thu, 2010-04-15 at 18:32 +0200, Michal Simek wrote:
>> Steven Rostedt wrote:
>>> On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote:
>>>> Hi Thomas and Steven,
>>>>
>>>> I would like to improve time measuring for ftrace (Log below)
>>>>
>>>> I looked at http://lkml.org/lkml/2009/11/16/181
>>>> where Thomas suggested to create sched_clock() function.
>>>> I used the same solution as Wu proposed but it is not nice.
>>>>
>>>> Is unimplemented sched_clock the reason why ftrace not show fine grain time?
>>> Yeah, sched_clock is used by ftrace for timings, so if it only returns
>>> jiffies, then that will, unfortunately, be the resolution of the tracer.
>>>
>>> I've been told that if you make a higher resolution timer for sched
>>> clock, it will improve the scheduling in CFS.
>> We have two timers (in one IP).
>> timer0 - in IRQ mode - clock event
>> timer1 - free running up counter without IRQ - clock source
>>
>> I hope that I can use timer1 in sched_clock too.
>>
>> Anyway. I played with it a little bit and I also implemented microblaze
>> specific sched_clock function (inspired by arch/arm/mach-ixp4xx/common.c)
>>
>> unsigned long long notrace sched_clock(void)
>> {
>> printk("%s\n", __func__);
>> cycle_t cyc = microblaze_read(NULL);
>> struct clocksource *cs = &clocksource_microblaze;
>> return clocksource_cyc2ns(cyc, cs->mult, cs->shift);
>> }
>
> Hey, I guess you got:
>
> sched_clock() -> printk() -> sched_clock() ...
>
> since printk() is not annotated with notrace, it will call mcount() and
> then sched_clock() and ... so, you can not call printk() in
> sched_clock().
>
> but you may try this if you want to print something in sched_clock() for
> debugging.
>
> #include <>
>
> sched_clock() {
> ftrace_stop();
> printk(...);
> ftrace_start();
> ...
> }

But I am describing different problem. sched_clock is called before
kernel initialized timers. Microblaze takes all information from DTS
that's means that I am not able to read timer because I even don't know
where it is. :-(

Thanks,
Michal




>
> Regards,
> Wu Zhangjin
>
>> (And comment sched_clock in kernel/sched_clock.c because our toolchain
>> has the problem with weak symbols)
>>
>> Recompile the kernel and then show the log_buf
>>
>> <4>Ramdisk addr 0x00000003, Compiled-in FDT at 0xc0242a28
>> <5>Linux version 2.6.34-rc4-00053-gbc6ce8a-dirty ([email protected]) (gcc
>> version 4.1.2) #27 Thu Apr 15 16:38:16 CEST 2010
>> <6>setup_cpuinfo: initialising
>> <6>setup_cpuinfo: Using full CPU PVR support
>> <6>cache: wb_msr
>> <6>setup_memory: Main mem: 0x48000000-0x50000000, size 0x08000000,
>> klimit 0xc09c7000
>> <6>setup_memory: max_mapnr: 0x8000
>> <6>setup_memory: min_low_pfn: 0x48000
>> <6>setup_memory: max_low_pfn: 0x50000
>> <4>reserved 0 - 0x48000000-0x009c8000
>> <4>reserved 1 - 0x48fe0000-0x00020000
>> <4>reserved 2 - 0x4fffd8c8-0x00002738
>> <7>On node 0 totalpages: 32768
>> <7>free_area_init_node: node 0, pgdat c02bc70c, node_mem_map c09c8000
>> <7> Normal zone: 256 pages used for memmap
>> <7> Normal zone: 32512 pages, LIFO batch:7
>> <4>Built 1 zonelists in Zone order, mobility grouping on. Total pages:
>> 32512
>> <5>Kernel command line: console=ttyS0,115200
>> <6>PID hash table entries: 512 (order: -1, 2048 bytes)
>> <6>Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
>> <6>Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
>> <6>Memory: 119796k/131072k available
>> <4>sched_clock
>>
>> The problem I see is that scheck_clock is used before our timer
>> subsystem is initialized.
>> What am I missing?
>>
>> Thanks,
>> Michal
>>
>
>


--
Michal Simek, Ing. (M.Eng)
PetaLogix - Linux Solutions for a Reconfigurable World
w: http://www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663

2010-04-15 19:52:11

by Arnd Bergmann

[permalink] [raw]
Subject: Re: sched_clock - microblaze

On Thursday 15 April 2010 21:29:27 Michal Simek wrote:
> > sched_clock() {
> > ftrace_stop();
> > printk(...);
> > ftrace_start();
> > ...
> > }
>
> But I am describing different problem. sched_clock is called before
> kernel initialized timers. Microblaze takes all information from DTS
> that's means that I am not able to read timer because I even don't know
> where it is. :-(

You could make sched_clock return zero before the clock is probed, it
is very unlikely to be needed before that anyway.

Arnd

2010-04-16 06:23:16

by Michal Simek

[permalink] [raw]
Subject: Re: sched_clock - microblaze

Arnd Bergmann wrote:
> On Thursday 15 April 2010 21:29:27 Michal Simek wrote:
>>> sched_clock() {
>>> ftrace_stop();
>>> printk(...);
>>> ftrace_start();
>>> ...
>>> }
>> But I am describing different problem. sched_clock is called before
>> kernel initialized timers. Microblaze takes all information from DTS
>> that's means that I am not able to read timer because I even don't know
>> where it is. :-(
>
> You could make sched_clock return zero before the clock is probed, it
> is very unlikely to be needed before that anyway.

First of all - this technique works.

That calling sequence is
main.c: sched_init() -> sched.c: sched_init(); -> init_idle(); ->
sched_clock() without initialized timer which is causing my problems.

Why is sched_clock called in init_idle? Do you get any non zero value on
any ARCH?

Thanks,
Michal

diff --git a/kernel/sched.c b/kernel/sched.c
index 6af210a..bddd918 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5244,7 +5244,7 @@ void __cpuinit init_idle(struct task_struct *idle,
int cpu)

__sched_fork(idle);
idle->state = TASK_RUNNING;
- idle->se.exec_start = sched_clock();
+ idle->se.exec_start = 0;

cpumask_copy(&idle->cpus_allowed, cpumask_of(cpu));
__set_task_cpu(idle, cpu);

--
Michal Simek, Ing. (M.Eng)
PetaLogix - Linux Solutions for a Reconfigurable World
w: http://www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663

2010-04-16 08:37:36

by Arnd Bergmann

[permalink] [raw]
Subject: Re: sched_clock - microblaze

On Friday 16 April 2010, Michal Simek wrote:
> Why is sched_clock called in init_idle? Do you get any non zero value on
> any ARCH?

Yes, some architectures return a real-time value here that starts before boot.
Also, when the clock starts at boot time, there can be a significant time
spent in the firmware and early kernel code before we get to init_idle.

Arnd

2010-04-16 09:11:53

by Michal Simek

[permalink] [raw]
Subject: Re: sched_clock - microblaze

Arnd Bergmann wrote:
> On Friday 16 April 2010, Michal Simek wrote:
>> Why is sched_clock called in init_idle? Do you get any non zero value on
>> any ARCH?
>
> Yes, some architectures return a real-time value here that starts before boot.
> Also, when the clock starts at boot time, there can be a significant time
> spent in the firmware and early kernel code before we get to init_idle.

ok. I will return 0 before timer initialization.

Thanks for explanation,
Michal

>
> Arnd


--
Michal Simek, Ing. (M.Eng)
PetaLogix - Linux Solutions for a Reconfigurable World
w: http://www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663