2006-02-27 15:33:06

by Arjan van de Ven

[permalink] [raw]
Subject: [Patch 0/4] Reordering of functions, try 2

Hi,


This is the second posting of the function reorder patches.
I've run a series of lmbench (3.0) runs on the patches earlier today on
this set, with the following results (I'll discuss 2 runs, one with the
first 3 patches, and one with all patches)


These are descriptions of the results, since the full lmbench run is a
LOT of numbers; this is the summary of it

1) There is a whole class of tests where it just doesn't matter at all.
In this class are the cpu measurements of FPU ops/second but also the
RAM bandwidth tests and the like. This is fully understandable; these
tests go straight from userspace->hardware, kernel changes don't impact
these.

2) In the "processor/processes" group, 7 tests changed behavior, and the
average of these changes was a performance increase by 10% (!!). The
exception was the signal handling test, which decreased by 6%. This
actually made me feel good, since the original function list was based
on a profile run that didn't do signals much if at all.

The second run (with all 4 patches) showed some fluctuations here but on
average it was in the noise region. Exception to this was the stat test,
which lost half the gain from the first 3 patches. Here also the signals
test lost

3) In the latency group, the 3-patches run is again in the 10% gain
range. Here the 4-patches run shows an additional gain for several of
the tests in the 5% range and for example the af_unix test showed a loss
compared to the 3-patches run




2006-02-27 15:32:14

by Arjan van de Ven

[permalink] [raw]
Subject: [Patch 2/4] Basic reorder infrastructure

This patch puts the infrastructure in place to allow for a reordering of
functions based inside the vmlinux. The general idea is that it is possible
to put all "common" functions into the first 2Mb of the code, so that they
are covered by one TLB entry. This as opposed to the current situation where
a typical vmlinux covers about 3.5Mb (on x86-64) and thus 2 TLB entries.

This is done by enabling the -ffunction-sections flag in gcc, which puts
each function in its own ELF section, so that the linker can then order them
in a way defined by the linker script.

As per previous discussions, Linus said he wanted a "static" list for this,
eg a list provided by the kernel tarbal, so that most people have the same
ordering at least. A script is provided to create this list based on
readprofile(1) output. The included list is provisional, and entirely biased
on my own testbox and me running a few kernel compiles and some other
things.

I think that to get to a better list we need to invite people to submit
their own profiles, and somehow add those all up and base the final list on
that. I'm willing to do that effort if this is ends up being the prefered
approach. Such an effort probably needs to be repeated like once a year or
so to adopt to the changing nature of the kernel.

---
arch/x86_64/Makefile | 1
arch/x86_64/kernel/functionlist | 1286 +++++++++++++++++++++++++++++++++++++++
arch/x86_64/kernel/vmlinux.lds.S | 5
scripts/profile2linkerlist.pl | 21
4 files changed, 1313 insertions(+)

Index: linux-reorder2/arch/x86_64/Makefile
===================================================================
--- linux-reorder2.orig/arch/x86_64/Makefile
+++ linux-reorder2/arch/x86_64/Makefile
@@ -35,6 +35,7 @@ CFLAGS += -m64
CFLAGS += -mno-red-zone
CFLAGS += -mcmodel=kernel
CFLAGS += -pipe
+CFLAGS += -ffunction-sections
# this makes reading assembly source easier, but produces worse code
# actually it makes the kernel smaller too.
CFLAGS += -fno-reorder-blocks
Index: linux-reorder2/arch/x86_64/kernel/functionlist
===================================================================
--- /dev/null
+++ linux-reorder2/arch/x86_64/kernel/functionlist
@@ -0,0 +1,1286 @@
+*(.text.flush_thread)
+*(.text.check_poison_obj)
+*(.text.copy_page)
+*(.text.__set_personality)
+*(.text.gart_map_sg)
+*(.text.kmem_cache_free)
+*(.text.find_get_page)
+*(.text._raw_spin_lock)
+*(.text.ide_outb)
+*(.text.unmap_vmas)
+*(.text.copy_page_range)
+*(.text.kprobe_handler)
+*(.text.__handle_mm_fault)
+*(.text.__d_lookup)
+*(.text.copy_user_generic)
+*(.text.__link_path_walk)
+*(.text.get_page_from_freelist)
+*(.text.kmem_cache_alloc)
+*(.text.drive_cmd_intr)
+*(.text.ia32_setup_sigcontext)
+*(.text.huge_pte_offset)
+*(.text.do_page_fault)
+*(.text.page_remove_rmap)
+*(.text.release_pages)
+*(.text.ide_end_request)
+*(.text.__mutex_lock_slowpath)
+*(.text.__find_get_block)
+*(.text.kfree)
+*(.text.vfs_read)
+*(.text._raw_spin_unlock)
+*(.text.free_hot_cold_page)
+*(.text.fget_light)
+*(.text.schedule)
+*(.text.memcmp)
+*(.text.touch_atime)
+*(.text.__might_sleep)
+*(.text.__down_read_trylock)
+*(.text.arch_pick_mmap_layout)
+*(.text.find_vma)
+*(.text.__make_request)
+*(.text.do_generic_mapping_read)
+*(.text.mutex_lock_interruptible)
+*(.text.__generic_file_aio_read)
+*(.text._atomic_dec_and_lock)
+*(.text.__wake_up_bit)
+*(.text.add_to_page_cache)
+*(.text.cache_alloc_debugcheck_after)
+*(.text.vm_normal_page)
+*(.text.mutex_debug_check_no_locks_freed)
+*(.text.net_rx_action)
+*(.text.__find_first_zero_bit)
+*(.text.put_page)
+*(.text._raw_read_lock)
+*(.text.__delay)
+*(.text.dnotify_parent)
+*(.text.do_path_lookup)
+*(.text.do_sync_read)
+*(.text.do_lookup)
+*(.text.bit_waitqueue)
+*(.text.file_read_actor)
+*(.text.strncpy_from_user)
+*(.text.__pagevec_lru_add_active)
+*(.text.fget)
+*(.text.dput)
+*(.text.__strnlen_user)
+*(.text.inotify_inode_queue_event)
+*(.text.rw_verify_area)
+*(.text.ide_intr)
+*(.text.inotify_dentry_parent_queue_event)
+*(.text.permission)
+*(.text.memscan)
+*(.text.hpet_rtc_interrupt)
+*(.text.do_mmap_pgoff)
+*(.text.current_fs_time)
+*(.text.vfs_getattr)
+*(.text.kmem_flagcheck)
+*(.text.mark_page_accessed)
+*(.text.free_pages_and_swap_cache)
+*(.text.generic_fillattr)
+*(.text.__block_prepare_write)
+*(.text.__set_page_dirty_nobuffers)
+*(.text.link_path_walk)
+*(.text.find_get_pages_tag)
+*(.text.ide_do_request)
+*(.text.__alloc_pages)
+*(.text.generic_permission)
+*(.text.mod_page_state_offset)
+*(.text.free_pgd_range)
+*(.text.generic_file_buffered_write)
+*(.text.number)
+*(.text.ide_do_rw_disk)
+*(.text.__brelse)
+*(.text.__mod_page_state_offset)
+*(.text.rotate_reclaimable_page)
+*(.text.find_vma_prepare)
+*(.text.find_vma_prev)
+*(.text.lru_cache_add_active)
+*(.text.__kmalloc_track_caller)
+*(.text.smp_invalidate_interrupt)
+*(.text.handle_IRQ_event)
+*(.text.__find_get_block_slow)
+*(.text.do_wp_page)
+*(.text.do_select)
+*(.text.set_user_nice)
+*(.text.sys_read)
+*(.text.do_munmap)
+*(.text.csum_partial)
+*(.text.__do_softirq)
+*(.text.may_open)
+*(.text.getname)
+*(.text.get_empty_filp)
+*(.text.__fput)
+*(.text.remove_mapping)
+*(.text.filp_ctor)
+*(.text.poison_obj)
+*(.text.unmap_region)
+*(.text.test_set_page_writeback)
+*(.text.__do_page_cache_readahead)
+*(.text.sock_def_readable)
+*(.text.ide_outl)
+*(.text.shrink_zone)
+*(.text.rb_insert_color)
+*(.text.get_request)
+*(.text.sys_pread64)
+*(.text.spin_bug)
+*(.text.ide_outsl)
+*(.text.mask_and_ack_8259A)
+*(.text.filemap_nopage)
+*(.text.page_add_file_rmap)
+*(.text.find_lock_page)
+*(.text.tcp_poll)
+*(.text.__mark_inode_dirty)
+*(.text.file_ra_state_init)
+*(.text.generic_file_llseek)
+*(.text.__pagevec_lru_add)
+*(.text.page_cache_readahead)
+*(.text.n_tty_receive_buf)
+*(.text.zonelist_policy)
+*(.text.vma_adjust)
+*(.text.test_clear_page_dirty)
+*(.text.sync_buffer)
+*(.text.do_exit)
+*(.text.__bitmap_weight)
+*(.text.alloc_pages_current)
+*(.text.get_unused_fd)
+*(.text.zone_watermark_ok)
+*(.text.cpuset_update_task_memory_state)
+*(.text.__bitmap_empty)
+*(.text.sys_munmap)
+*(.text.__inode_dir_notify)
+*(.text.__generic_file_aio_write_nolock)
+*(.text.__pte_alloc)
+*(.text.sys_select)
+*(.text.vm_acct_memory)
+*(.text.vfs_write)
+*(.text.__lru_add_drain)
+*(.text.prio_tree_insert)
+*(.text.generic_file_aio_read)
+*(.text.vma_merge)
+*(.text.block_write_full_page)
+*(.text.__page_set_anon_rmap)
+*(.text.apic_timer_interrupt)
+*(.text.release_console_sem)
+*(.text.sys_write)
+*(.text.sys_brk)
+*(.text.dup_mm)
+*(.text.read_current_timer)
+*(.text.ll_rw_block)
+*(.text.blk_rq_map_sg)
+*(.text.dbg_userword)
+*(.text.__block_commit_write)
+*(.text.cache_grow)
+*(.text.copy_strings)
+*(.text.release_task)
+*(.text.do_sync_write)
+*(.text.unlock_page)
+*(.text.load_elf_binary)
+*(.text.__follow_mount)
+*(.text.__getblk)
+*(.text.do_sys_open)
+*(.text.current_kernel_time)
+*(.text.call_rcu)
+*(.text.write_chan)
+*(.text.vsnprintf)
+*(.text.dummy_inode_setsecurity)
+*(.text.submit_bh)
+*(.text.poll_freewait)
+*(.text.bio_alloc_bioset)
+*(.text.skb_clone)
+*(.text.page_waitqueue)
+*(.text.__mutex_lock_interruptible_slowpath)
+*(.text.get_index)
+*(.text.csum_partial_copy_generic)
+*(.text.bad_range)
+*(.text.remove_vma)
+*(.text.cp_new_stat)
+*(.text.alloc_arraycache)
+*(.text.test_clear_page_writeback)
+*(.text.strsep)
+*(.text.open_namei)
+*(.text._raw_read_unlock)
+*(.text.get_vma_policy)
+*(.text.__down_write_trylock)
+*(.text.find_get_pages)
+*(.text.tcp_rcv_established)
+*(.text.generic_make_request)
+*(.text.__block_write_full_page)
+*(.text.cfq_set_request)
+*(.text.sys_inotify_init)
+*(.text.split_vma)
+*(.text.__mod_timer)
+*(.text.get_options)
+*(.text.vma_link)
+*(.text.mpage_writepages)
+*(.text.truncate_complete_page)
+*(.text.tcp_recvmsg)
+*(.text.sigprocmask)
+*(.text.filemap_populate)
+*(.text.sys_close)
+*(.text.inotify_dev_queue_event)
+*(.text.do_task_stat)
+*(.text.__dentry_open)
+*(.text.unlink_file_vma)
+*(.text.__pollwait)
+*(.text.packet_rcv_spkt)
+*(.text.drop_buffers)
+*(.text.free_pgtables)
+*(.text.generic_file_direct_write)
+*(.text.copy_process)
+*(.text.netif_receive_skb)
+*(.text.dnotify_flush)
+*(.text.print_bad_pte)
+*(.text.anon_vma_unlink)
+*(.text.sys_mprotect)
+*(.text.sync_sb_inodes)
+*(.text.find_inode_fast)
+*(.text.dummy_inode_readlink)
+*(.text.putname)
+*(.text.init_smp_flush)
+*(.text.dbg_redzone2)
+*(.text.sk_run_filter)
+*(.text.may_expand_vm)
+*(.text.generic_file_aio_write)
+*(.text.find_next_zero_bit)
+*(.text.file_kill)
+*(.text.audit_getname)
+*(.text.arch_unmap_area_topdown)
+*(.text.alloc_page_vma)
+*(.text.tcp_transmit_skb)
+*(.text.rb_next)
+*(.text.dbg_redzone1)
+*(.text.generic_file_mmap)
+*(.text.vfs_fstat)
+*(.text.sys_time)
+*(.text.page_lock_anon_vma)
+*(.text.get_unmapped_area)
+*(.text.remote_llseek)
+*(.text.__up_read)
+*(.text.fd_install)
+*(.text.eventpoll_init_file)
+*(.text.dma_alloc_coherent)
+*(.text.create_empty_buffers)
+*(.text.__mutex_unlock_slowpath)
+*(.text.dup_fd)
+*(.text.d_alloc)
+*(.text.tty_ldisc_try)
+*(.text.sys_stime)
+*(.text.__rb_rotate_right)
+*(.text.d_validate)
+*(.text.rb_erase)
+*(.text.path_release)
+*(.text.memmove)
+*(.text.invalidate_complete_page)
+*(.text.clear_inode)
+*(.text.cache_estimate)
+*(.text.alloc_buffer_head)
+*(.text.smp_call_function_interrupt)
+*(.text.flush_tlb_others)
+*(.text.file_move)
+*(.text.balance_dirty_pages_ratelimited)
+*(.text.vma_prio_tree_add)
+*(.text.timespec_trunc)
+*(.text.mempool_alloc)
+*(.text.iget_locked)
+*(.text.d_alloc_root)
+*(.text.cpuset_populate_dir)
+*(.text.anon_vma_prepare)
+*(.text.sys_newstat)
+*(.text.alloc_page_interleave)
+*(.text.__path_lookup_intent_open)
+*(.text.__pagevec_free)
+*(.text.inode_init_once)
+*(.text.free_vfsmnt)
+*(.text.__user_walk_fd)
+*(.text.cfq_idle_slice_timer)
+*(.text.sys_mmap)
+*(.text.sys_llseek)
+*(.text.prio_tree_remove)
+*(.text.filp_close)
+*(.text.file_permission)
+*(.text.vma_prio_tree_remove)
+*(.text.tcp_ack)
+*(.text.nameidata_to_filp)
+*(.text.sys_lseek)
+*(.text.percpu_counter_mod)
+*(.text.igrab)
+*(.text.__bread)
+*(.text.alloc_inode)
+*(.text.filldir)
+*(.text.__rb_rotate_left)
+*(.text.irq_affinity_write_proc)
+*(.text.init_request_from_bio)
+*(.text.find_or_create_page)
+*(.text.tty_poll)
+*(.text.tcp_sendmsg)
+*(.text.ide_wait_stat)
+*(.text.free_buffer_head)
+*(.text.flush_signal_handlers)
+*(.text.tcp_v4_rcv)
+*(.text.nr_blockdev_pages)
+*(.text.locks_remove_flock)
+*(.text.__iowrite32_copy)
+*(.text.do_filp_open)
+*(.text.try_to_release_page)
+*(.text.page_add_new_anon_rmap)
+*(.text.kmem_cache_size)
+*(.text.eth_type_trans)
+*(.text.try_to_free_buffers)
+*(.text.schedule_tail)
+*(.text.proc_lookup)
+*(.text.no_llseek)
+*(.text.kfree_skbmem)
+*(.text.do_wait)
+*(.text.do_mpage_readpage)
+*(.text.vfs_stat_fd)
+*(.text.tty_write)
+*(.text.705)
+*(.text.sync_page)
+*(.text.__remove_shared_vm_struct)
+*(.text.__kfree_skb)
+*(.text.sock_poll)
+*(.text.get_request_wait)
+*(.text.do_sigaction)
+*(.text.do_brk)
+*(.text.tcp_event_data_recv)
+*(.text.read_chan)
+*(.text.pipe_writev)
+*(.text.__emul_lookup_dentry)
+*(.text.rtc_get_rtc_time)
+*(.text.print_objinfo)
+*(.text.file_update_time)
+*(.text.do_signal)
+*(.text.disable_8259A_irq)
+*(.text.blk_queue_bounce)
+*(.text.__anon_vma_link)
+*(.text.__vma_link)
+*(.text.vfs_rename)
+*(.text.sys_newlstat)
+*(.text.sys_newfstat)
+*(.text.sys_mknod)
+*(.text.__show_regs)
+*(.text.iput)
+*(.text.get_signal_to_deliver)
+*(.text.flush_tlb_page)
+*(.text.debug_mutex_wake_waiter)
+*(.text.copy_thread)
+*(.text.clear_page_dirty_for_io)
+*(.text.buffer_io_error)
+*(.text.vfs_permission)
+*(.text.truncate_inode_pages_range)
+*(.text.sys_recvfrom)
+*(.text.remove_suid)
+*(.text.mark_buffer_dirty)
+*(.text.local_bh_enable)
+*(.text.get_zeroed_page)
+*(.text.get_vmalloc_info)
+*(.text.flush_old_exec)
+*(.text.dummy_inode_permission)
+*(.text.__bio_add_page)
+*(.text.prio_tree_replace)
+*(.text.notify_change)
+*(.text.mntput_no_expire)
+*(.text.fput)
+*(.text.__end_that_request_first)
+*(.text.wake_up_bit)
+*(.text.unuse_mm)
+*(.text.skb_release_data)
+*(.text.shrink_icache_memory)
+*(.text.sched_balance_self)
+*(.text.__pmd_alloc)
+*(.text.pipe_poll)
+*(.text.normal_poll)
+*(.text.__free_pages)
+*(.text.follow_mount)
+*(.text.cdrom_start_packet_command)
+*(.text.blk_recount_segments)
+*(.text.bio_put)
+*(.text.__alloc_skb)
+*(.text.__wake_up)
+*(.text.vm_stat_account)
+*(.text.sys_fcntl)
+*(.text.sys_fadvise64)
+*(.text._raw_write_unlock)
+*(.text.__pud_alloc)
+*(.text.alloc_page_buffers)
+*(.text.vfs_llseek)
+*(.text.sockfd_lookup)
+*(.text._raw_write_lock)
+*(.text.put_compound_page)
+*(.text.prune_dcache)
+*(.text.pipe_readv)
+*(.text.mempool_free)
+*(.text.make_ahead_window)
+*(.text.lru_add_drain)
+*(.text.constant_test_bit)
+*(.text.__clear_user)
+*(.text.arch_unmap_area)
+*(.text.anon_vma_link)
+*(.text.sys_chroot)
+*(.text.setup_arg_pages)
+*(.text.radix_tree_preload)
+*(.text.init_rwsem)
+*(.text.generic_osync_inode)
+*(.text.generic_delete_inode)
+*(.text.do_sys_poll)
+*(.text.dev_queue_xmit)
+*(.text.default_llseek)
+*(.text.__writeback_single_inode)
+*(.text.vfs_ioctl)
+*(.text.__up_write)
+*(.text.unix_poll)
+*(.text.sys_rt_sigprocmask)
+*(.text.sock_recvmsg)
+*(.text.recalc_bh_state)
+*(.text.__put_unused_fd)
+*(.text.process_backlog)
+*(.text.locks_remove_posix)
+*(.text.lease_modify)
+*(.text.expand_files)
+*(.text.end_buffer_read_nobh)
+*(.text.d_splice_alias)
+*(.text.debug_mutex_init_waiter)
+*(.text.copy_from_user)
+*(.text.cap_vm_enough_memory)
+*(.text.show_vfsmnt)
+*(.text.release_sock)
+*(.text.pfifo_fast_enqueue)
+*(.text.half_md4_transform)
+*(.text.fs_may_remount_ro)
+*(.text.do_fork)
+*(.text.copy_hugetlb_page_range)
+*(.text.cache_free_debugcheck)
+*(.text.__tcp_select_window)
+*(.text.task_handoff_register)
+*(.text.sys_open)
+*(.text.strlcpy)
+*(.text.skb_copy_datagram_iovec)
+*(.text.set_up_list3s)
+*(.text.release_open_intent)
+*(.text.qdisc_restart)
+*(.text.n_tty_chars_in_buffer)
+*(.text.inode_change_ok)
+*(.text.__downgrade_write)
+*(.text.debug_mutex_unlock)
+*(.text.add_timer_randomness)
+*(.text.sock_common_recvmsg)
+*(.text.set_bh_page)
+*(.text.printk_lock)
+*(.text.path_release_on_umount)
+*(.text.ip_output)
+*(.text.ide_build_dmatable)
+*(.text.__get_user_8)
+*(.text.end_buffer_read_sync)
+*(.text.__d_path)
+*(.text.d_move)
+*(.text.del_timer)
+*(.text.constant_test_bit)
+*(.text.blockable_page_cache_readahead)
+*(.text.tty_read)
+*(.text.sys_readlink)
+*(.text.sys_faccessat)
+*(.text.read_swap_cache_async)
+*(.text.pty_write_room)
+*(.text.page_address_in_vma)
+*(.text.kthread)
+*(.text.cfq_exit_io_context)
+*(.text.__tcp_push_pending_frames)
+*(.text.sys_pipe)
+*(.text.submit_bio)
+*(.text.pid_revalidate)
+*(.text.page_referenced_file)
+*(.text.lock_sock)
+*(.text.get_page_state_node)
+*(.text.generic_block_bmap)
+*(.text.do_setitimer)
+*(.text.dev_queue_xmit_nit)
+*(.text.copy_from_read_buf)
+*(.text.__const_udelay)
+*(.text.console_conditional_schedule)
+*(.text.wake_up_new_task)
+*(.text.wait_for_completion_interruptible)
+*(.text.tcp_rcv_rtt_update)
+*(.text.sys_mlockall)
+*(.text.set_fs_altroot)
+*(.text.schedule_timeout)
+*(.text.nr_free_pagecache_pages)
+*(.text.nf_iterate)
+*(.text.mapping_tagged)
+*(.text.ip_queue_xmit)
+*(.text.ip_local_deliver)
+*(.text.follow_page)
+*(.text.elf_map)
+*(.text.dummy_file_permission)
+*(.text.dispose_list)
+*(.text.dentry_open)
+*(.text.dentry_iput)
+*(.text.bio_alloc)
+*(.text.alloc_skb_from_cache)
+*(.text.wait_on_page_bit)
+*(.text.vfs_readdir)
+*(.text.vfs_lstat)
+*(.text.seq_escape)
+*(.text.__posix_lock_file)
+*(.text.mm_release)
+*(.text.kref_put)
+*(.text.ip_rcv)
+*(.text.__iget)
+*(.text.free_pages)
+*(.text.find_mergeable_anon_vma)
+*(.text.find_extend_vma)
+*(.text.dummy_inode_listsecurity)
+*(.text.bio_add_page)
+*(.text.__vm_enough_memory)
+*(.text.vfs_stat)
+*(.text.tty_paranoia_check)
+*(.text.tcp_read_sock)
+*(.text.tcp_data_queue)
+*(.text.sys_uname)
+*(.text.sys_renameat)
+*(.text.__strncpy_from_user)
+*(.text.__mutex_init)
+*(.text.__lookup_hash)
+*(.text.kref_get)
+*(.text.ip_route_input)
+*(.text.__insert_inode_hash)
+*(.text.do_sock_write)
+*(.text.blk_done_softirq)
+*(.text.__wake_up_sync)
+*(.text.__vma_link_rb)
+*(.text.tty_ioctl)
+*(.text.tracesys)
+*(.text.sys_getdents)
+*(.text.sys_dup)
+*(.text.stub_execve)
+*(.text.sha_transform)
+*(.text.radix_tree_tag_clear)
+*(.text.put_unused_fd)
+*(.text.put_files_struct)
+*(.text.mpage_readpages)
+*(.text.may_delete)
+*(.text.kmem_cache_create)
+*(.text.ip_mc_output)
+*(.text.interleave_nodes)
+*(.text.groups_search)
+*(.text.generic_drop_inode)
+*(.text.generic_commit_write)
+*(.text.fcntl_setlk)
+*(.text.exit_mmap)
+*(.text.end_page_writeback)
+*(.text.__d_rehash)
+*(.text.debug_mutex_free_waiter)
+*(.text.csum_ipv6_magic)
+*(.text.count)
+*(.text.cleanup_rbuf)
+*(.text.check_spinlock_acquired_node)
+*(.text.can_vma_merge_after)
+*(.text.bio_endio)
+*(.text.alloc_pidmap)
+*(.text.write_ldt)
+*(.text.vmtruncate_range)
+*(.text.vfs_create)
+*(.text.__user_walk)
+*(.text.update_send_head)
+*(.text.unmap_underlying_metadata)
+*(.text.tty_ldisc_deref)
+*(.text.tcp_setsockopt)
+*(.text.tcp_send_ack)
+*(.text.sys_pause)
+*(.text.sys_gettimeofday)
+*(.text.sync_dirty_buffer)
+*(.text.strncmp)
+*(.text.release_posix_timer)
+*(.text.proc_file_read)
+*(.text.prepare_to_wait)
+*(.text.locks_mandatory_locked)
+*(.text.interruptible_sleep_on_timeout)
+*(.text.inode_sub_bytes)
+*(.text.in_group_p)
+*(.text.hrtimer_try_to_cancel)
+*(.text.filldir64)
+*(.text.fasync_helper)
+*(.text.dummy_sb_pivotroot)
+*(.text.d_lookup)
+*(.text.d_instantiate)
+*(.text.__d_find_alias)
+*(.text.cpu_idle_wait)
+*(.text.cond_resched_lock)
+*(.text.chown_common)
+*(.text.blk_congestion_wait)
+*(.text.activate_page)
+*(.text.unlock_buffer)
+*(.text.tty_wakeup)
+*(.text.tcp_v4_do_rcv)
+*(.text.tcp_current_mss)
+*(.text.sys_openat)
+*(.text.sys_fchdir)
+*(.text.strnlen_user)
+*(.text.strnlen)
+*(.text.strchr)
+*(.text.sock_common_getsockopt)
+*(.text.skb_checksum)
+*(.text.remove_wait_queue)
+*(.text.rb_replace_node)
+*(.text.radix_tree_node_ctor)
+*(.text.pty_chars_in_buffer)
+*(.text.profile_hit)
+*(.text.prio_tree_left)
+*(.text.pgd_clear_bad)
+*(.text.pfifo_fast_dequeue)
+*(.text.page_referenced)
+*(.text.open_exec)
+*(.text.mmput)
+*(.text.mm_init)
+*(.text.__ide_dma_off_quietly)
+*(.text.ide_dma_intr)
+*(.text.hrtimer_start)
+*(.text.get_io_context)
+*(.text.__get_free_pages)
+*(.text.find_first_zero_bit)
+*(.text.file_free_rcu)
+*(.text.dummy_socket_sendmsg)
+*(.text.do_unlinkat)
+*(.text.do_arch_prctl)
+*(.text.destroy_inode)
+*(.text.can_vma_merge_before)
+*(.text.block_sync_page)
+*(.text.block_prepare_write)
+*(.text.bio_init)
+*(.text.arch_ptrace)
+*(.text.wake_up_inode)
+*(.text.wait_on_retry_sync_kiocb)
+*(.text.vma_prio_tree_next)
+*(.text.tcp_rcv_space_adjust)
+*(.text.__tcp_ack_snd_check)
+*(.text.sys_utime)
+*(.text.sys_recvmsg)
+*(.text.sys_mremap)
+*(.text.sys_bdflush)
+*(.text.sleep_on)
+*(.text.set_page_dirty_lock)
+*(.text.seq_path)
+*(.text.schedule_timeout_interruptible)
+*(.text.sched_fork)
+*(.text.rt_run_flush)
+*(.text.profile_munmap)
+*(.text.prepare_binprm)
+*(.text.__pagevec_release_nonlru)
+*(.text.m_show)
+*(.text.lookup_mnt)
+*(.text.__lookup_mnt)
+*(.text.lock_timer_base)
+*(.text.is_subdir)
+*(.text.invalidate_bh_lru)
+*(.text.init_buffer_head)
+*(.text.ifind_fast)
+*(.text.ide_dma_start)
+*(.text.__get_page_state)
+*(.text.flock_to_posix_lock)
+*(.text.__find_symbol)
+*(.text.do_futex)
+*(.text.do_execve)
+*(.text.dirty_writeback_centisecs_handler)
+*(.text.dev_watchdog)
+*(.text.can_share_swap_page)
+*(.text.blkdev_put)
+*(.text.bio_get_nr_vecs)
+*(.text.xfrm_compile_policy)
+*(.text.vma_prio_tree_insert)
+*(.text.vfs_lstat_fd)
+*(.text.__user_path_lookup_open)
+*(.text.thread_return)
+*(.text.tcp_send_delayed_ack)
+*(.text.sock_def_error_report)
+*(.text.shrink_slab)
+*(.text.serial_out)
+*(.text.seq_read)
+*(.text.secure_ip_id)
+*(.text.search_binary_handler)
+*(.text.proc_pid_unhash)
+*(.text.pagevec_lookup)
+*(.text.new_inode)
+*(.text.memcpy_toiovec)
+*(.text.locks_free_lock)
+*(.text.__lock_page)
+*(.text.__lock_buffer)
+*(.text.load_module)
+*(.text.is_bad_inode)
+*(.text.invalidate_inode_buffers)
+*(.text.insert_vm_struct)
+*(.text.inode_setattr)
+*(.text.inode_add_bytes)
+*(.text.ide_read_24)
+*(.text.ide_get_error_location)
+*(.text.ide_do_drive_cmd)
+*(.text.get_locked_pte)
+*(.text.get_filesystem_list)
+*(.text.generic_file_open)
+*(.text.follow_down)
+*(.text.find_next_bit)
+*(.text.__find_first_bit)
+*(.text.exit_mm)
+*(.text.exec_keys)
+*(.text.end_buffer_write_sync)
+*(.text.end_bio_bh_io_sync)
+*(.text.dummy_socket_shutdown)
+*(.text.d_rehash)
+*(.text.d_path)
+*(.text.do_ioctl)
+*(.text.dget_locked)
+*(.text.copy_thread_group_keys)
+*(.text.cdrom_end_request)
+*(.text.cap_bprm_apply_creds)
+*(.text.blk_rq_bio_prep)
+*(.text.__bitmap_intersects)
+*(.text.bio_phys_segments)
+*(.text.bio_free)
+*(.text.arch_get_unmapped_area_topdown)
+*(.text.writeback_in_progress)
+*(.text.vfs_follow_link)
+*(.text.tcp_rcv_state_process)
+*(.text.tcp_check_space)
+*(.text.sys_stat)
+*(.text.sys_rt_sigreturn)
+*(.text.sys_rt_sigaction)
+*(.text.sys_remap_file_pages)
+*(.text.sys_pwrite64)
+*(.text.sys_fchownat)
+*(.text.sys_fchmodat)
+*(.text.strncat)
+*(.text.strlcat)
+*(.text.strcmp)
+*(.text.steal_locks)
+*(.text.sock_create)
+*(.text.sk_stream_rfree)
+*(.text.sk_stream_mem_schedule)
+*(.text.skip_atoi)
+*(.text.sk_alloc)
+*(.text.show_stat)
+*(.text.set_fs_pwd)
+*(.text.set_binfmt)
+*(.text.pty_unthrottle)
+*(.text.proc_symlink)
+*(.text.pipe_release)
+*(.text.pageout)
+*(.text.n_tty_write_wakeup)
+*(.text.n_tty_ioctl)
+*(.text.nr_free_zone_pages)
+*(.text.migration_thread)
+*(.text.mempool_free_slab)
+*(.text.meminfo_read_proc)
+*(.text.max_sane_readahead)
+*(.text.lru_cache_add)
+*(.text.kill_fasync)
+*(.text.kernel_read)
+*(.text.invalidate_mapping_pages)
+*(.text.inode_has_buffers)
+*(.text.init_once)
+*(.text.inet_sendmsg)
+*(.text.idedisk_issue_flush)
+*(.text.generic_file_write)
+*(.text.free_more_memory)
+*(.text.__free_fdtable)
+*(.text.filp_dtor)
+*(.text.exit_sem)
+*(.text.exit_itimers)
+*(.text.error_interrupt)
+*(.text.end_buffer_async_write)
+*(.text.eligible_child)
+*(.text.elf_map)
+*(.text.dump_task_regs)
+*(.text.dummy_task_setscheduler)
+*(.text.dummy_socket_accept)
+*(.text.dummy_file_free_security)
+*(.text.__down_read)
+*(.text.do_sock_read)
+*(.text.do_sigaltstack)
+*(.text.do_mremap)
+*(.text.current_io_context)
+*(.text.cpu_swap_callback)
+*(.text.copy_vma)
+*(.text.cap_bprm_set_security)
+*(.text.blk_insert_request)
+*(.text.bio_map_kern_endio)
+*(.text.bio_hw_segments)
+*(.text.bictcp_cong_avoid)
+*(.text.add_interrupt_randomness)
+*(.text.wait_for_completion)
+*(.text.version_read_proc)
+*(.text.unix_write_space)
+*(.text.tty_ldisc_ref_wait)
+*(.text.tty_ldisc_put)
+*(.text.try_to_wake_up)
+*(.text.tcp_v4_tw_remember_stamp)
+*(.text.tcp_try_undo_dsack)
+*(.text.tcp_may_send_now)
+*(.text.sys_waitid)
+*(.text.sys_sched_getparam)
+*(.text.sys_getppid)
+*(.text.sys_getcwd)
+*(.text.sys_dup2)
+*(.text.sys_chmod)
+*(.text.sys_chdir)
+*(.text.sprintf)
+*(.text.sock_wfree)
+*(.text.sock_aio_write)
+*(.text.skb_drop_fraglist)
+*(.text.skb_dequeue)
+*(.text.set_close_on_exec)
+*(.text.set_brk)
+*(.text.seq_puts)
+*(.text.SELECT_DRIVE)
+*(.text.sched_exec)
+*(.text.return_EIO)
+*(.text.remove_from_page_cache)
+*(.text.rcu_start_batch)
+*(.text.__put_task_struct)
+*(.text.proc_pid_readdir)
+*(.text.proc_get_inode)
+*(.text.prepare_to_wait_exclusive)
+*(.text.pipe_wait)
+*(.text.pipe_new)
+*(.text.pdflush_operation)
+*(.text.__pagevec_release)
+*(.text.pagevec_lookup_tag)
+*(.text.packet_rcv)
+*(.text.n_tty_set_room)
+*(.text.nr_free_pages)
+*(.text.__net_timestamp)
+*(.text.mpage_end_io_read)
+*(.text.mod_timer)
+*(.text.__memcpy)
+*(.text.mb_cache_shrink_fn)
+*(.text.lock_rename)
+*(.text.kstrdup)
+*(.text.is_ignored)
+*(.text.int_very_careful)
+*(.text.inotify_inode_is_dead)
+*(.text.inotify_get_cookie)
+*(.text.inode_get_bytes)
+*(.text.init_timer)
+*(.text.init_dev)
+*(.text.inet_getname)
+*(.text.ide_map_sg)
+*(.text.__ide_dma_end)
+*(.text.hrtimer_get_remaining)
+*(.text.get_task_mm)
+*(.text.get_random_int)
+*(.text.free_pipe_info)
+*(.text.filemap_write_and_wait_range)
+*(.text.exit_thread)
+*(.text.enter_idle)
+*(.text.end_that_request_first)
+*(.text.end_8259A_irq)
+*(.text.dummy_file_alloc_security)
+*(.text.do_group_exit)
+*(.text.debug_mutex_init)
+*(.text.cpuset_exit)
+*(.text.cpu_idle)
+*(.text.copy_semundo)
+*(.text.copy_files)
+*(.text.chrdev_open)
+*(.text.cdrom_transfer_packet_command)
+*(.text.cdrom_mode_sense)
+*(.text.blk_phys_contig_segment)
+*(.text.blk_get_queue)
+*(.text.bio_split)
+*(.text.audit_alloc)
+*(.text.anon_pipe_buf_release)
+*(.text.add_wait_queue_exclusive)
+*(.text.add_wait_queue)
+*(.text.acct_process)
+*(.text.account)
+*(.text.zeromap_page_range)
+*(.text.yield)
+*(.text.writeback_acquire)
+*(.text.worker_thread)
+*(.text.wait_on_page_writeback_range)
+*(.text.__wait_on_buffer)
+*(.text.vscnprintf)
+*(.text.vmalloc_to_pfn)
+*(.text.vgacon_save_screen)
+*(.text.vfs_unlink)
+*(.text.vfs_rmdir)
+*(.text.unregister_md_personality)
+*(.text.unlock_new_inode)
+*(.text.unix_stream_sendmsg)
+*(.text.unix_stream_recvmsg)
+*(.text.unhash_process)
+*(.text.udp_v4_lookup_longway)
+*(.text.tty_ldisc_flush)
+*(.text.tty_ldisc_enable)
+*(.text.tty_hung_up_p)
+*(.text.tty_buffer_free_all)
+*(.text.tso_fragment)
+*(.text.try_to_del_timer_sync)
+*(.text.tcp_v4_err)
+*(.text.tcp_unhash)
+*(.text.tcp_seq_next)
+*(.text.tcp_select_initial_window)
+*(.text.tcp_sacktag_write_queue)
+*(.text.tcp_cwnd_validate)
+*(.text.sys_vhangup)
+*(.text.sys_uselib)
+*(.text.sys_symlink)
+*(.text.sys_signal)
+*(.text.sys_poll)
+*(.text.sys_mount)
+*(.text.sys_kill)
+*(.text.sys_ioctl)
+*(.text.sys_inotify_add_watch)
+*(.text.sys_getuid)
+*(.text.sys_getrlimit)
+*(.text.sys_getitimer)
+*(.text.sys_getgroups)
+*(.text.sys_ftruncate)
+*(.text.sysfs_lookup)
+*(.text.sys_exit_group)
+*(.text.stub_fork)
+*(.text.sscanf)
+*(.text.sock_map_fd)
+*(.text.sock_get_timestamp)
+*(.text.__sock_create)
+*(.text.smp_call_function_single)
+*(.text.sk_stop_timer)
+*(.text.skb_copy_and_csum_datagram)
+*(.text.__skb_checksum_complete)
+*(.text.single_next)
+*(.text.sigqueue_alloc)
+*(.text.shrink_dcache_parent)
+*(.text.select_idle_routine)
+*(.text.run_workqueue)
+*(.text.run_local_timers)
+*(.text.remove_inode_hash)
+*(.text.remove_dquot_ref)
+*(.text.register_binfmt)
+*(.text.read_cache_pages)
+*(.text.rb_last)
+*(.text.pty_open)
+*(.text.proc_root_readdir)
+*(.text.proc_pid_flush)
+*(.text.proc_pident_lookup)
+*(.text.proc_fill_super)
+*(.text.proc_exe_link)
+*(.text.posix_locks_deadlock)
+*(.text.pipe_iov_copy_from_user)
+*(.text.opost)
+*(.text.nf_register_hook)
+*(.text.netif_rx_ni)
+*(.text.m_start)
+*(.text.mpage_writepage)
+*(.text.mm_alloc)
+*(.text.memory_open)
+*(.text.mark_buffer_async_write)
+*(.text.lru_add_drain_all)
+*(.text.locks_init_lock)
+*(.text.locks_delete_lock)
+*(.text.lock_hrtimer_base)
+*(.text.load_script)
+*(.text.__kill_fasync)
+*(.text.ip_mc_sf_allow)
+*(.text.__ioremap)
+*(.text.int_with_check)
+*(.text.int_sqrt)
+*(.text.install_thread_keyring)
+*(.text.init_page_buffers)
+*(.text.inet_sock_destruct)
+*(.text.idle_notifier_register)
+*(.text.ide_execute_command)
+*(.text.ide_end_drive_cmd)
+*(.text.__ide_dma_host_on)
+*(.text.hrtimer_run_queues)
+*(.text.hpet_mask_rtc_irq_bit)
+*(.text.__get_zone_counts)
+*(.text.get_zone_counts)
+*(.text.get_write_access)
+*(.text.get_fs_struct)
+*(.text.get_dirty_limits)
+*(.text.generic_readlink)
+*(.text.free_hot_page)
+*(.text.finish_wait)
+*(.text.find_inode)
+*(.text.find_first_bit)
+*(.text.__filemap_fdatawrite_range)
+*(.text.__filemap_copy_from_user_iovec)
+*(.text.exit_aio)
+*(.text.elv_set_request)
+*(.text.elv_former_request)
+*(.text.dup_namespace)
+*(.text.dupfd)
+*(.text.dummy_socket_getsockopt)
+*(.text.dummy_sb_post_mountroot)
+*(.text.dummy_quotactl)
+*(.text.dummy_inode_rename)
+*(.text.__do_SAK)
+*(.text.do_pipe)
+*(.text.do_fsync)
+*(.text.d_instantiate_unique)
+*(.text.d_find_alias)
+*(.text.deny_write_access)
+*(.text.dentry_unhash)
+*(.text.d_delete)
+*(.text.datagram_poll)
+*(.text.cpuset_fork)
+*(.text.cpuid_read)
+*(.text.copy_namespace)
+*(.text.cond_resched)
+*(.text.check_version)
+*(.text.__change_page_attr)
+*(.text.cfq_slab_kill)
+*(.text.cfq_completed_request)
+*(.text.cdrom_pc_intr)
+*(.text.cdrom_decode_status)
+*(.text.cap_capset_check)
+*(.text.blk_put_request)
+*(.text.bio_fs_destructor)
+*(.text.bictcp_min_cwnd)
+*(.text.alloc_chrdev_region)
+*(.text.add_element)
+*(.text.acct_update_integrals)
+*(.text.write_boundary_block)
+*(.text.writeback_release)
+*(.text.writeback_inodes)
+*(.text.wake_up_state)
+*(.text.__wake_up_locked)
+*(.text.wake_futex)
+*(.text.wait_task_inactive)
+*(.text.__wait_on_freeing_inode)
+*(.text.wait_noreap_copyout)
+*(.text.vmstat_start)
+*(.text.vgacon_do_font_op)
+*(.text.vfs_readv)
+*(.text.vfs_quota_sync)
+*(.text.update_queue)
+*(.text.unshare_files)
+*(.text.unmap_vm_area)
+*(.text.unix_socketpair)
+*(.text.unix_release_sock)
+*(.text.unix_detach_fds)
+*(.text.unix_create1)
+*(.text.unix_bind)
+*(.text.udp_sendmsg)
+*(.text.udp_rcv)
+*(.text.udp_queue_rcv_skb)
+*(.text.uart_write)
+*(.text.uart_startup)
+*(.text.uart_open)
+*(.text.tty_vhangup)
+*(.text.tty_termios_baud_rate)
+*(.text.tty_release)
+*(.text.tty_ldisc_ref)
+*(.text.throttle_vm_writeout)
+*(.text.058)
+*(.text.tcp_xmit_probe_skb)
+*(.text.tcp_v4_send_check)
+*(.text.tcp_v4_destroy_sock)
+*(.text.tcp_sync_mss)
+*(.text.tcp_snd_test)
+*(.text.tcp_slow_start)
+*(.text.tcp_send_fin)
+*(.text.tcp_rtt_estimator)
+*(.text.tcp_parse_options)
+*(.text.tcp_ioctl)
+*(.text.tcp_init_tso_segs)
+*(.text.tcp_init_cwnd)
+*(.text.tcp_getsockopt)
+*(.text.tcp_fin)
+*(.text.tcp_connect)
+*(.text.tcp_cong_avoid)
+*(.text.__tcp_checksum_complete_user)
+*(.text.task_dumpable)
+*(.text.sys_wait4)
+*(.text.sys_utimes)
+*(.text.sys_symlinkat)
+*(.text.sys_socketpair)
+*(.text.sys_rmdir)
+*(.text.sys_readahead)
+*(.text.sys_nanosleep)
+*(.text.sys_linkat)
+*(.text.sys_fstat)
+*(.text.sysfs_readdir)
+*(.text.sys_execve)
+*(.text.sysenter_tracesys)
+*(.text.sys_chown)
+*(.text.stub_clone)
+*(.text.strrchr)
+*(.text.strncpy)
+*(.text.stopmachine_set_state)
+*(.text.sock_sendmsg)
+*(.text.sock_release)
+*(.text.sock_fasync)
+*(.text.sock_close)
+*(.text.sk_stream_write_space)
+*(.text.sk_reset_timer)
+*(.text.skb_split)
+*(.text.skb_recv_datagram)
+*(.text.skb_queue_tail)
+*(.text.sk_attach_filter)
+*(.text.si_swapinfo)
+*(.text.simple_strtoll)
+*(.text.set_termios)
+*(.text.set_task_comm)
+*(.text.set_shrinker)
+*(.text.set_normalized_timespec)
+*(.text.set_brk)
+*(.text.serial_in)
+*(.text.seq_printf)
+*(.text.secure_dccp_sequence_number)
+*(.text.rwlock_bug)
+*(.text.rt_hash_code)
+*(.text.__rta_fill)
+*(.text.__request_resource)
+*(.text.relocate_new_kernel)
+*(.text.release_thread)
+*(.text.release_mem)
+*(.text.rb_prev)
+*(.text.rb_first)
+*(.text.random_poll)
+*(.text.__put_super_and_need_restart)
+*(.text.pty_write)
+*(.text.ptrace_stop)
+*(.text.proc_self_readlink)
+*(.text.proc_root_lookup)
+*(.text.proc_root_link)
+*(.text.proc_pid_make_inode)
+*(.text.proc_pid_attr_write)
+*(.text.proc_lookupfd)
+*(.text.proc_delete_inode)
+*(.text.posix_same_owner)
+*(.text.posix_block_lock)
+*(.text.poll_initwait)
+*(.text.pipe_write)
+*(.text.pipe_read_fasync)
+*(.text.pipe_ioctl)
+*(.text.pdflush)
+*(.text.pci_user_read_config_dword)
+*(.text.page_readlink)
+*(.text.null_lseek)
+*(.text.nf_hook_slow)
+*(.text.netlink_sock_destruct)
+*(.text.netlink_broadcast)
+*(.text.neigh_resolve_output)
+*(.text.name_to_int)
+*(.text.mwait_idle)
+*(.text.mutex_trylock)
+*(.text.mutex_debug_check_no_locks_held)
+*(.text.m_stop)
+*(.text.mpage_end_io_write)
+*(.text.mpage_alloc)
+*(.text.move_page_tables)
+*(.text.mounts_open)
+*(.text.__memset)
+*(.text.memcpy_fromiovec)
+*(.text.make_8259A_irq)
+*(.text.lookup_user_key_possessed)
+*(.text.lookup_create)
+*(.text.locks_insert_lock)
+*(.text.locks_alloc_lock)
+*(.text.kthread_should_stop)
+*(.text.kswapd)
+*(.text.kobject_uevent)
+*(.text.kobject_get_path)
+*(.text.kobject_get)
+*(.text.klist_children_put)
+*(.text.__ip_route_output_key)
+*(.text.ip_flush_pending_frames)
+*(.text.ip_compute_csum)
+*(.text.ip_append_data)
+*(.text.ioc_set_batching)
+*(.text.invalidate_inode_pages)
+*(.text.__invalidate_device)
+*(.text.install_arg_page)
+*(.text.in_sched_functions)
+*(.text.inotify_unmount_inodes)
+*(.text.init_once)
+*(.text.init_cdrom_command)
+*(.text.inet_stream_connect)
+*(.text.inet_sk_rebuild_header)
+*(.text.inet_csk_addr2sockaddr)
+*(.text.inet_create)
+*(.text.ifind)
+*(.text.ide_setup_dma)
+*(.text.ide_outsw)
+*(.text.ide_fixstring)
+*(.text.ide_dma_setup)
+*(.text.ide_cdrom_packet)
+*(.text.ide_cd_put)
+*(.text.ide_build_sglist)
+*(.text.i8259A_shutdown)
+*(.text.hung_up_tty_ioctl)
+*(.text.hrtimer_nanosleep)
+*(.text.hrtimer_init)
+*(.text.hrtimer_cancel)
+*(.text.hash_futex)
+*(.text.group_send_sig_info)
+*(.text.grab_cache_page_nowait)
+*(.text.get_wchan)
+*(.text.get_stack)
+*(.text.get_page_state)
+*(.text.getnstimeofday)
+*(.text.get_node)
+*(.text.get_kprobe)
+*(.text.generic_unplug_device)
+*(.text.free_task)
+*(.text.frag_show)
+*(.text.find_next_zero_string)
+*(.text.filp_open)
+*(.text.fillonedir)
+*(.text.exit_io_context)
+*(.text.exit_idle)
+*(.text.exact_lock)
+*(.text.eth_header)
+*(.text.dummy_unregister_security)
+*(.text.dummy_socket_post_create)
+*(.text.dummy_socket_listen)
+*(.text.dummy_quota_on)
+*(.text.dummy_inode_follow_link)
+*(.text.dummy_file_receive)
+*(.text.dummy_file_mprotect)
+*(.text.dummy_file_lock)
+*(.text.dummy_file_ioctl)
+*(.text.dummy_bprm_post_apply_creds)
+*(.text.do_writepages)
+*(.text.__down_interruptible)
+*(.text.do_notify_resume)
+*(.text.do_acct_process)
+*(.text.del_timer_sync)
+*(.text.default_rebuild_header)
+*(.text.d_callback)
+*(.text.dcache_readdir)
+*(.text.ctrl_dumpfamily)
+*(.text.cpuset_rmdir)
+*(.text.copy_strings_kernel)
+*(.text.con_write_room)
+*(.text.complete_all)
+*(.text.collect_sigign_sigcatch)
+*(.text.clear_user)
+*(.text.check_unthrottle)
+*(.text.cdrom_release)
+*(.text.cdrom_newpc_intr)
+*(.text.cdrom_ioctl)
+*(.text.cdrom_check_status)
+*(.text.cdev_put)
+*(.text.cdev_add)
+*(.text.cap_ptrace)
+*(.text.cap_bprm_secureexec)
+*(.text.cache_alloc_refill)
+*(.text.bmap)
+*(.text.blk_run_queue)
+*(.text.blk_queue_dma_alignment)
+*(.text.blk_ordered_req_seq)
+*(.text.blk_backing_dev_unplug)
+*(.text.__bitmap_subset)
+*(.text.__bitmap_and)
+*(.text.bio_unmap_user)
+*(.text.__bforget)
+*(.text.bd_forget)
+*(.text.bad_pipe_w)
+*(.text.bad_get_user)
+*(.text.audit_free)
+*(.text.anon_vma_ctor)
+*(.text.anon_pipe_buf_map)
+*(.text.alloc_sock_iocb)
+*(.text.alloc_fdset)
+*(.text.aio_kick_handler)
+*(.text.__add_entropy_words)
+*(.text.add_disk_randomness)
Index: linux-reorder2/arch/x86_64/kernel/vmlinux.lds.S
===================================================================
--- linux-reorder2.orig/arch/x86_64/kernel/vmlinux.lds.S
+++ linux-reorder2/arch/x86_64/kernel/vmlinux.lds.S
@@ -20,7 +20,12 @@ SECTIONS
phys_startup_64 = startup_64 - LOAD_OFFSET;
_text = .; /* Text and read-only data */
.text : AT(ADDR(.text) - LOAD_OFFSET) {
+ /* First the code that has to be first for bootstrapping */
*(.bootstrap.text)
+ /* Then all the functions that are "hot" in profiles, to group them
+ onto the same hugetlb entry */
+ #include "functionlist"
+ /* Then the rest */
*(.text)
SCHED_TEXT
LOCK_TEXT
Index: linux-reorder2/scripts/profile2linkerlist.pl
===================================================================
--- /dev/null
+++ linux-reorder2/scripts/profile2linkerlist.pl
@@ -0,0 +1,21 @@
+#!/usr/bin/perl
+
+#
+# Takes a (sorted) output of readprofile and turns it into a list suitable for
+# linker scripts
+#
+# usage:
+# readprofile | sort -rn | perl profile2linkerlist.pl > functionlist
+#
+
+while (<>) {
+ my $line = $_;
+
+ $_ =~ /\W*[0-9]+\W*([a-zA-Z\_0-9]+)\W*[0-9]+/;
+
+ if ( ($line =~ /unknown/) || ($line =~ /total/)) {
+
+ } else {
+ print "*(.text.$1)\n";
+ }
+}



2006-02-27 15:32:37

by Arjan van de Ven

[permalink] [raw]
Subject: [Patch 1/4] avoid entry.S functions from reordering

This patch puts the code from head.S in a special .bootstrap.text
section.

I'm working on a patch to reorder the functions in the kernel (I'll post
that later), but for x86-64 at least the kernel bootstrap requires that the
head.S functions are on the very first page/pages of the kernel text. This
is understandable since the bootstrap is complex enough already and not a
problem at all, it just means they aren't allowed to be reordered. This
patch puts these special functions into a separate section to document this,
and to guarantee this in the light of possibly reordering the rest later.

(So this patch doesn't fix a bug per se, but makes things more robust by
making the order of these functions explicit)

Signed-off-by: Arjan van de Ven <[email protected]>
---
arch/x86_64/kernel/head.S | 1 +
arch/x86_64/kernel/vmlinux.lds.S | 1 +
2 files changed, 2 insertions(+)

Index: linux-reorder2/arch/x86_64/kernel/head.S
===================================================================
--- linux-reorder2.orig/arch/x86_64/kernel/head.S
+++ linux-reorder2/arch/x86_64/kernel/head.S
@@ -26,6 +26,7 @@
*/

.text
+ .section .bootstrap.text
.code32
.globl startup_32
/* %bx: 1 if coming from smp trampoline on secondary cpu */
Index: linux-reorder2/arch/x86_64/kernel/vmlinux.lds.S
===================================================================
--- linux-reorder2.orig/arch/x86_64/kernel/vmlinux.lds.S
+++ linux-reorder2/arch/x86_64/kernel/vmlinux.lds.S
@@ -20,6 +20,7 @@ SECTIONS
phys_startup_64 = startup_64 - LOAD_OFFSET;
_text = .; /* Text and read-only data */
.text : AT(ADDR(.text) - LOAD_OFFSET) {
+ *(.bootstrap.text)
*(.text)
SCHED_TEXT
LOCK_TEXT

2006-02-27 15:32:34

by Arjan van de Ven

[permalink] [raw]
Subject: [Patch 4/4] Tell GCC 4.1 to move unlikely() code to a separate section

This patch is more controversial I assume; it offers the option
to use the gcc 4.1 option to move unlikely() code to a separate section.
On the con side, this means that longer byte sequences are needed to jump
to this code, on the Pro side it means that the unlikely() code isn't sharing
icache cachelines and tlbs anymore.

Signed-off-by: Arjan van de Ven <[email protected]>

---
arch/x86_64/Kconfig.debug | 10 ++++++++++
arch/x86_64/Makefile | 4 ++++
2 files changed, 14 insertions(+)

Index: linux-reorder2/arch/x86_64/Kconfig.debug
===================================================================
--- linux-reorder2.orig/arch/x86_64/Kconfig.debug
+++ linux-reorder2/arch/x86_64/Kconfig.debug
@@ -12,6 +12,16 @@ config DEBUG_RODATA
of the kernel code won't be covered by a 2MB TLB anymore.
If in doubt, say "N".

+config REORDER_BLOCKS
+ bool "Enable gcc to move unlikely() code away"
+ depends on DEBUG_KERNEL
+ help
+ This option will enable gcc 4.1 and later to reorder code that is
+ marked unlikely() far away from the code to the end of the kernel
+ image. This allows the processor to make better use of the instruction
+ cache, however it also makes OOPses harder to decode.
+ If in doubt, say "N"
+
config IOMMU_DEBUG
depends on GART_IOMMU && DEBUG_KERNEL
bool "Enable IOMMU debugging"
Index: linux-reorder2/arch/x86_64/Makefile
===================================================================
--- linux-reorder2.orig/arch/x86_64/Makefile
+++ linux-reorder2/arch/x86_64/Makefile
@@ -47,6 +47,10 @@ ifneq ($(CONFIG_DEBUG_INFO),y)
# -fweb shrinks the kernel a bit, but the difference is very small
# it also messes up debugging, so don't use it for now.
#CFLAGS += $(call cc-option,-fweb)
+# -freorder-blocks-and-partition moves all unlikely() code out of the
+# way, to increase icache density of the hot codepaths.
+# this makes oopses harder to read as well, so it's a config option
+cflags-$(CONFIG_REORDER_BLOCKS) += $(call cc-option,-freorder-blocks-and-partition)
endif
# -funit-at-a-time shrinks the kernel .text considerably
# unfortunately it makes reading oopses harder.

2006-02-27 15:33:07

by Arjan van de Ven

[permalink] [raw]
Subject: [Patch 3/4] Move the base kernel to 2Mb to align with TLB boundaries

As suggested by Andi (and Alan), move the default kernel location
from 1Mb to 2Mb, to align to the start of a TLB entry.

Signed-off-by: Arjan van de Ven <[email protected]>

---
arch/x86_64/Kconfig | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-reorder2/arch/x86_64/Kconfig
===================================================================
--- linux-reorder2.orig/arch/x86_64/Kconfig
+++ linux-reorder2/arch/x86_64/Kconfig
@@ -429,10 +429,10 @@ config CRASH_DUMP
config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP)
default "0x1000000" if CRASH_DUMP
- default "0x100000"
+ default "0x200000"
help
This gives the physical address where the kernel is loaded. Normally
- for regular kernels this value is 0x100000 (1MB). But in the case
+ for regular kernels this value is 0x200000 (2MB). But in the case
of kexec on panic the fail safe kernel needs to run at a different
address than the panic-ed kernel. This option is used to set the load
address for kernels used to capture crash dump on being kexec'ed

2006-02-27 15:42:32

by Andi Kleen

[permalink] [raw]
Subject: Re: [Patch 3/4] Move the base kernel to 2Mb to align with TLB boundaries

On Monday 27 February 2006 16:31, Arjan van de Ven wrote:
> As suggested by Andi (and Alan), move the default kernel location
> from 1Mb to 2Mb, to align to the start of a TLB entry.

I already have that and the head.S patch in my tree.

-Andi

2006-02-27 15:42:55

by Andi Kleen

[permalink] [raw]
Subject: Re: [Patch 4/4] Tell GCC 4.1 to move unlikely() code to a separate section

On Monday 27 February 2006 16:31, Arjan van de Ven wrote:
> This patch is more controversial I assume; it offers the option
> to use the gcc 4.1 option to move unlikely() code to a separate section.
> On the con side, this means that longer byte sequences are needed to jump
> to this code, on the Pro side it means that the unlikely() code isn't sharing
> icache cachelines and tlbs anymore.

I don't think this will do anything because the default Makefile
still has

CFLAGS += -fno-reorder-blocks

That was me because it made assembly debugging much easier. I would be willing
to reconsider this if you can give me some hard data just from this change:
- benchmark changes
- .text size increase

Also I don't like it being an separate CONFIG options. We already have too many
obscure ones. Either it should be on by default or not there at all.

-Andi

2006-02-27 15:42:54

by Andi Kleen

[permalink] [raw]
Subject: Re: [Patch 2/4] Basic reorder infrastructure

On Monday 27 February 2006 16:27, Arjan van de Ven wrote:
> This patch puts the infrastructure in place to allow for a reordering of
> functions based inside the vmlinux. The general idea is that it is possible
> to put all "common" functions into the first 2Mb of the code, so that they
> are covered by one TLB entry. This as opposed to the current situation where
> a typical vmlinux covers about 3.5Mb (on x86-64) and thus 2 TLB entries.

Looks good. I will apply that.

-Andi

2006-02-27 15:42:30

by Andi Kleen

[permalink] [raw]
Subject: Re: [Patch 0/4] Reordering of functions, try 2

On Monday 27 February 2006 16:23, Arjan van de Ven wrote:

>
> 2) In the "processor/processes" group, 7 tests changed behavior, and the
> average of these changes was a performance increase by 10% (!!). The
> exception was the signal handling test, which decreased by 6%. This
> actually made me feel good, since the original function list was based
> on a profile run that didn't do signals much if at all.

How often did you rerun lmbench each time? In my experience the numbers
are somewhat unstable. Still looks encouraging.

-Andi

2006-02-27 15:43:45

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Patch 0/4] Reordering of functions, try 2

Andi Kleen wrote:
> On Monday 27 February 2006 16:23, Arjan van de Ven wrote:
>
>> 2) In the "processor/processes" group, 7 tests changed behavior, and the
>> average of these changes was a performance increase by 10% (!!). The
>> exception was the signal handling test, which decreased by 6%. This
>> actually made me feel good, since the original function list was based
>> on a profile run that didn't do signals much if at all.
>
> How often did you rerun lmbench each time?

I ran it 5 times each time and then took the average of the runs
(yes that takes forever)

> In my experience the numbers
> are somewhat unstable. Still looks encouraging.


2006-02-27 15:53:03

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Patch 3/4] Move the base kernel to 2Mb to align with TLB boundaries

Andi Kleen wrote:
> On Monday 27 February 2006 16:31, Arjan van de Ven wrote:
>> As suggested by Andi (and Alan), move the default kernel location
>> from 1Mb to 2Mb, to align to the start of a TLB entry.
>
> I already have that and the head.S patch in my tree.
>

ok they haven't changed so no need to update your tree. I've included
them to get the full series for people who don't use your tree ;-)

2006-02-27 16:31:24

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [Patch 2/4] Basic reorder infrastructure

> This patch puts the infrastructure in place to allow for a reordering of
> functions based inside the vmlinux.

Can we make this general instead of x86_64 only?
Then we can use Kconfig to enable it for the architectures where we want it.


>
> Index: linux-reorder2/arch/x86_64/Makefile
> ===================================================================
> --- linux-reorder2.orig/arch/x86_64/Makefile
> +++ linux-reorder2/arch/x86_64/Makefile
> @@ -35,6 +35,7 @@ CFLAGS += -m64
> CFLAGS += -mno-red-zone
> CFLAGS += -mcmodel=kernel
> CFLAGS += -pipe
> +CFLAGS += -ffunction-sections

This should go in top-level Makefile
> Index: linux-reorder2/arch/x86_64/kernel/functionlist
> ===================================================================
> --- /dev/null
> +++ linux-reorder2/arch/x86_64/kernel/functionlist

I would have used extension .lds - but no strong feeling for it.

> Index: linux-reorder2/arch/x86_64/kernel/vmlinux.lds.S
> ===================================================================
> --- linux-reorder2.orig/arch/x86_64/kernel/vmlinux.lds.S
> +++ linux-reorder2/arch/x86_64/kernel/vmlinux.lds.S
> @@ -20,7 +20,12 @@ SECTIONS
> phys_startup_64 = startup_64 - LOAD_OFFSET;
> _text = .; /* Text and read-only data */
> .text : AT(ADDR(.text) - LOAD_OFFSET) {
> + /* First the code that has to be first for bootstrapping */
> *(.bootstrap.text)
> + /* Then all the functions that are "hot" in profiles, to group them
> + onto the same hugetlb entry */
> + #include "functionlist"
> + /* Then the rest */

And this part to go into include/asm-generaic/vmlinux.lds.h

> *(.text)
> SCHED_TEXT
> LOCK_TEXT


Sam

2006-02-27 17:19:38

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Patch 2/4] Basic reorder infrastructure

On Mon, 2006-02-27 at 17:31 +0100, [email protected] wrote:
> > This patch puts the infrastructure in place to allow for a reordering of
> > functions based inside the vmlinux.
>
> Can we make this general instead of x86_64 only?
> Then we can use Kconfig to enable it for the architectures where we want it.

Actually Linus had pretty good arguments to make this per-architecture:
the list will be different on each architecture.

(eg my first patch had it more generic; but Linus asked it to be per
arch, and I agree with the reasons he gave)

Also I doubt it can be enabled "blindly" for all architectures; I expect
more to need hacks similar to the x86_64 entry.S fix before it can
work...


2006-02-28 00:14:52

by Bill Davidsen

[permalink] [raw]
Subject: Re: [Patch 4/4] Tell GCC 4.1 to move unlikely() code to a separate section

Andi Kleen wrote:
> On Monday 27 February 2006 16:31, Arjan van de Ven wrote:
>> This patch is more controversial I assume; it offers the option
>> to use the gcc 4.1 option to move unlikely() code to a separate section.
>> On the con side, this means that longer byte sequences are needed to jump
>> to this code, on the Pro side it means that the unlikely() code isn't sharing
>> icache cachelines and tlbs anymore.
>
> I don't think this will do anything because the default Makefile
> still has
>
> CFLAGS += -fno-reorder-blocks
>
> That was me because it made assembly debugging much easier. I would be willing
> to reconsider this if you can give me some hard data just from this change:
> - benchmark changes
> - .text size increase
>
> Also I don't like it being an separate CONFIG options. We already have too many
> obscure ones. Either it should be on by default or not there at all.

I think you just made the case for an option, if it does produce
substantially better code and justify being done, you still would want a
way to kill it for debugging.

I'd like to see what it gains in general, and how much it depends on
processor type and cache size.
--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2006-02-28 06:35:58

by Chuck Ebbert

[permalink] [raw]
Subject: Re: [Patch 4/4] Tell GCC 4.1 to move unlikely() code to a separate section

In-Reply-To: <[email protected]>

On Mon, 27 Feb 2006 16:39:34, Andi Kleen wrote:
> On Monday 27 February 2006 16:31, Arjan van de Ven wrote:
> > This patch is more controversial I assume; it offers the option
> > to use the gcc 4.1 option to move unlikely() code to a separate section.
> > On the con side, this means that longer byte sequences are needed to jump
> > to this code, on the Pro side it means that the unlikely() code isn't sharing
> > icache cachelines and tlbs anymore.
>
> I don't think this will do anything because the default Makefile
> still has
>
> CFLAGS += -fno-reorder-blocks

This also won't work for functions that specify an explicit section name,
e.g. __sched, __kprobes etc. -- at least that's what the gcc 4.0.2 docs say.

--
Chuck
"Equations are the Devil's sentences." --Stephen Colbert

2006-02-28 19:09:09

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [Patch 2/4] Basic reorder infrastructure

On Mon, Feb 27, 2006 at 06:19:34PM +0100, Arjan van de Ven wrote:
> On Mon, 2006-02-27 at 17:31 +0100, [email protected] wrote:
> > > This patch puts the infrastructure in place to allow for a reordering of
> > > functions based inside the vmlinux.
> >
> > Can we make this general instead of x86_64 only?
> > Then we can use Kconfig to enable it for the architectures where we want it.
>
> Actually Linus had pretty good arguments to make this per-architecture:
> the list will be different on each architecture.
>
> (eg my first patch had it more generic; but Linus asked it to be per
> arch, and I agree with the reasons he gave)
The list should be per. architecture for - but I just wanted the rest to
be shared since I assume this will soon be adopted by others.
But on the other hand the rest is very few lines so it is not crucial.

>
> Also I doubt it can be enabled "blindly" for all architectures; I expect
> more to need hacks similar to the x86_64 entry.S fix before it can
> work...
Which is why utilising Kconfig to enable it would make sense.

Sam

2006-03-10 17:13:28

by Andi Kleen

[permalink] [raw]
Subject: Re: [Patch 2/4] Basic reorder infrastructure - makes linking very slow

On Monday 27 February 2006 18:19, Arjan van de Ven wrote:
> On Mon, 2006-02-27 at 17:31 +0100, [email protected] wrote:
> > > This patch puts the infrastructure in place to allow for a reordering of
> > > functions based inside the vmlinux.
> >
> > Can we make this general instead of x86_64 only?
> > Then we can use Kconfig to enable it for the architectures where we want it.
>
> Actually Linus had pretty good arguments to make this per-architecture:
> the list will be different on each architecture.
>
> (eg my first patch had it more generic; but Linus asked it to be per
> arch, and I agree with the reasons he gave)
>
> Also I doubt it can be enabled "blindly" for all architectures; I expect
> more to need hacks similar to the x86_64 entry.S fix before it can
> work...

Hi,

I just discovered that this patch is the reason why my compiles slowed
down so dramatically (thanks to Michael M. for the hint)
The SUSE 10.0 ld goes from running in seconds to more than a minute.

I think I will drop this patch for now because I doubt the
runtime improvement is worth the compile slowdown.

If there is some binutils release that handles this without dramatic
slowdown we can test for it in the Makefile, but i don't want it
to be enabled by default.

-Andi