Hi Paul,
On Wed, Nov 18, 2009 at 8:12 PM, Paul E. McKenney
<[email protected]> wrote:
> I am seeing some lockdep complaints in rcutorture runs that include
> frequent CPU-hotplug operations. ?The tests are otherwise successful.
> My first thought was to send a patch that gave each array_cache
> structure's ->lock field its own struct lock_class_key, but you already
> have a init_lock_keys() that seems to be intended to deal with this.
>
> So, please see below for the lockdep complaint and the .config file.
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Thanx, Paul
>
> ------------------------------------------------------------------------
>
> =============================================
> [ INFO: possible recursive locking detected ]
> 2.6.32-rc4-autokern1 #1
> ---------------------------------------------
> syslogd/2908 is trying to acquire lock:
> ?(&nc->lock){..-...}, at: [<c0000000001407f4>] .kmem_cache_free+0x118/0x2d4
>
> but task is already holding lock:
> ?(&nc->lock){..-...}, at: [<c0000000001411bc>] .kfree+0x1f0/0x324
>
> other info that might help us debug this:
> 3 locks held by syslogd/2908:
> ?#0: ?(&u->readlock){+.+.+.}, at: [<c0000000004556f8>] .unix_dgram_recvmsg+0x70/0x338
> ?#1: ?(&nc->lock){..-...}, at: [<c0000000001411bc>] .kfree+0x1f0/0x324
> ?#2: ?(&parent->list_lock){-.-...}, at: [<c000000000140f64>] .__drain_alien_cache+0x50/0xb8
I *think* this is a false positive. The nc->lock in slab_destroy()
should always be different from the one we took in kfree() because
it's a per-struct kmem_cache "slab cache". Peter, what do you think?
If my analysis is correct, any suggestions how to fix lockdep
annotations in slab?
>
> stack backtrace:
> Call Trace:
> [c0000000e8ccafc0] [c0000000000101e4] .show_stack+0x70/0x184 (unreliable)
> [c0000000e8ccb070] [c0000000000afebc] .validate_chain+0x6ec/0xf58
> [c0000000e8ccb180] [c0000000000b0ff0] .__lock_acquire+0x8c8/0x974
> [c0000000e8ccb280] [c0000000000b2290] .lock_acquire+0x140/0x18c
> [c0000000e8ccb350] [c000000000468df0] ._spin_lock+0x48/0x70
> [c0000000e8ccb3e0] [c0000000001407f4] .kmem_cache_free+0x118/0x2d4
> [c0000000e8ccb4a0] [c000000000140b90] .free_block+0x130/0x1a8
> [c0000000e8ccb540] [c000000000140f94] .__drain_alien_cache+0x80/0xb8
> [c0000000e8ccb5e0] [c0000000001411e0] .kfree+0x214/0x324
> [c0000000e8ccb6a0] [c0000000003ca860] .skb_release_data+0xe8/0x104
> [c0000000e8ccb730] [c0000000003ca2ec] .__kfree_skb+0x20/0xd4
> [c0000000e8ccb7b0] [c0000000003cf2c8] .skb_free_datagram+0x1c/0x5c
> [c0000000e8ccb830] [c00000000045597c] .unix_dgram_recvmsg+0x2f4/0x338
> [c0000000e8ccb920] [c0000000003c0f14] .sock_recvmsg+0xf4/0x13c
> [c0000000e8ccbb30] [c0000000003c28ec] .SyS_recvfrom+0xb4/0x130
> [c0000000e8ccbcb0] [c0000000003bfb78] .sys_recv+0x18/0x2c
> [c0000000e8ccbd20] [c0000000003ed388] .compat_sys_recv+0x14/0x28
> [c0000000e8ccbd90] [c0000000003ee1bc] .compat_sys_socketcall+0x178/0x220
> [c0000000e8ccbe30] [c0000000000085d4] syscall_exit+0x0/0x40
>
> ------------------------------------------------------------------------
>
> #
> # Automatically generated make config: don't edit
> # Linux kernel version: 2.6.32-rc4-autokern1
> # Tue Nov 17 19:22:46 2009
> #
> CONFIG_PPC64=y
>
> #
> # Processor support
> #
> CONFIG_PPC_BOOK3S_64=y
> # CONFIG_PPC_BOOK3E_64 is not set
> CONFIG_PPC_BOOK3S=y
> # CONFIG_POWER4_ONLY is not set
> CONFIG_POWER3=y
> CONFIG_POWER4=y
> # CONFIG_TUNE_CELL is not set
> CONFIG_PPC_FPU=y
> CONFIG_ALTIVEC=y
> # CONFIG_VSX is not set
> CONFIG_PPC_STD_MMU=y
> CONFIG_PPC_STD_MMU_64=y
> CONFIG_PPC_MM_SLICES=y
> CONFIG_VIRT_CPU_ACCOUNTING=y
> CONFIG_PPC_HAVE_PMU_SUPPORT=y
> CONFIG_PPC_PERF_CTRS=y
> CONFIG_SMP=y
> CONFIG_NR_CPUS=8
> CONFIG_64BIT=y
> CONFIG_WORD_SIZE=64
> CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
> CONFIG_MMU=y
> CONFIG_GENERIC_CMOS_UPDATE=y
> CONFIG_GENERIC_TIME=y
> CONFIG_GENERIC_TIME_VSYSCALL=y
> CONFIG_GENERIC_CLOCKEVENTS=y
> CONFIG_GENERIC_HARDIRQS=y
> CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
> CONFIG_HAVE_SETUP_PER_CPU_AREA=y
> CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
> CONFIG_IRQ_PER_CPU=y
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_HAVE_LATENCYTOP_SUPPORT=y
> CONFIG_TRACE_IRQFLAGS_SUPPORT=y
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_RWSEM_XCHGADD_ALGORITHM=y
> CONFIG_GENERIC_LOCKBREAK=y
> CONFIG_ARCH_HAS_ILOG2_U32=y
> CONFIG_ARCH_HAS_ILOG2_U64=y
> CONFIG_GENERIC_HWEIGHT=y
> CONFIG_GENERIC_FIND_NEXT_BIT=y
> CONFIG_ARCH_NO_VIRT_TO_BUS=y
> CONFIG_PPC=y
> CONFIG_EARLY_PRINTK=y
> CONFIG_COMPAT=y
> CONFIG_SYSVIPC_COMPAT=y
> CONFIG_SCHED_OMIT_FRAME_POINTER=y
> CONFIG_ARCH_MAY_HAVE_PC_FDC=y
> CONFIG_PPC_OF=y
> CONFIG_OF=y
> CONFIG_PPC_UDBG_16550=y
> CONFIG_GENERIC_TBSYNC=y
> CONFIG_AUDIT_ARCH=y
> CONFIG_GENERIC_BUG=y
> CONFIG_DTC=y
> # CONFIG_DEFAULT_UIMAGE is not set
> CONFIG_HIBERNATE_64=y
> CONFIG_ARCH_HIBERNATION_POSSIBLE=y
> # CONFIG_PPC_DCR_NATIVE is not set
> # CONFIG_PPC_DCR_MMIO is not set
> # CONFIG_PPC_OF_PLATFORM_PCI is not set
> CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
> CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
> CONFIG_CONSTRUCTORS=y
>
> #
> # General setup
> #
> CONFIG_EXPERIMENTAL=y
> CONFIG_LOCK_KERNEL=y
> CONFIG_INIT_ENV_ARG_LIMIT=32
> CONFIG_LOCALVERSION=""
> # CONFIG_LOCALVERSION_AUTO is not set
> CONFIG_SWAP=y
> CONFIG_SYSVIPC=y
> CONFIG_SYSVIPC_SYSCTL=y
> CONFIG_POSIX_MQUEUE=y
> CONFIG_POSIX_MQUEUE_SYSCTL=y
> CONFIG_BSD_PROCESS_ACCT=y
> CONFIG_BSD_PROCESS_ACCT_V3=y
> CONFIG_TASKSTATS=y
> CONFIG_TASK_DELAY_ACCT=y
> # CONFIG_TASK_XACCT is not set
> CONFIG_AUDIT=y
> CONFIG_AUDITSYSCALL=y
> CONFIG_AUDIT_TREE=y
>
> #
> # RCU Subsystem
> #
> # CONFIG_TREE_RCU is not set
> CONFIG_TREE_PREEMPT_RCU=y
> # CONFIG_TINY_RCU is not set
> CONFIG_RCU_TRACE=y
> CONFIG_RCU_FANOUT=2
> # CONFIG_RCU_FANOUT_EXACT is not set
> CONFIG_TREE_RCU_TRACE=y
> CONFIG_IKCONFIG=y
> CONFIG_IKCONFIG_PROC=y
> CONFIG_LOG_BUF_SHIFT=19
> # CONFIG_GROUP_SCHED is not set
> # CONFIG_CGROUPS is not set
> CONFIG_SYSFS_DEPRECATED=y
> CONFIG_SYSFS_DEPRECATED_V2=y
> # CONFIG_RELAY is not set
> CONFIG_NAMESPACES=y
> # CONFIG_UTS_NS is not set
> # CONFIG_IPC_NS is not set
> # CONFIG_USER_NS is not set
> # CONFIG_PID_NS is not set
> # CONFIG_NET_NS is not set
> CONFIG_BLK_DEV_INITRD=y
> CONFIG_INITRAMFS_SOURCE=""
> CONFIG_RD_GZIP=y
> CONFIG_RD_BZIP2=y
> CONFIG_RD_LZMA=y
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> CONFIG_SYSCTL=y
> CONFIG_ANON_INODES=y
> # CONFIG_EMBEDDED is not set
> CONFIG_SYSCTL_SYSCALL=y
> CONFIG_KALLSYMS=y
> CONFIG_KALLSYMS_ALL=y
> # CONFIG_KALLSYMS_EXTRA_PASS is not set
> CONFIG_HOTPLUG=y
> CONFIG_PRINTK=y
> CONFIG_BUG=y
> CONFIG_ELF_CORE=y
> CONFIG_PCSPKR_PLATFORM=y
> CONFIG_BASE_FULL=y
> CONFIG_FUTEX=y
> CONFIG_EPOLL=y
> CONFIG_SIGNALFD=y
> CONFIG_TIMERFD=y
> CONFIG_EVENTFD=y
> CONFIG_SHMEM=y
> CONFIG_AIO=y
> CONFIG_HAVE_PERF_EVENTS=y
>
> #
> # Kernel Performance Events And Counters
> #
> CONFIG_PERF_EVENTS=y
> CONFIG_EVENT_PROFILE=y
> # CONFIG_PERF_COUNTERS is not set
> # CONFIG_DEBUG_PERF_USE_VMALLOC is not set
> CONFIG_VM_EVENT_COUNTERS=y
> CONFIG_PCI_QUIRKS=y
> CONFIG_COMPAT_BRK=y
> CONFIG_SLAB=y
> # CONFIG_SLUB is not set
> # CONFIG_SLOB is not set
> CONFIG_PROFILING=y
> CONFIG_TRACEPOINTS=y
> CONFIG_OPROFILE=y
> CONFIG_HAVE_OPROFILE=y
> CONFIG_KPROBES=y
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
> CONFIG_HAVE_SYSCALL_WRAPPERS=y
> CONFIG_KRETPROBES=y
> CONFIG_HAVE_IOREMAP_PROT=y
> CONFIG_HAVE_KPROBES=y
> CONFIG_HAVE_KRETPROBES=y
> CONFIG_HAVE_ARCH_TRACEHOOK=y
> CONFIG_HAVE_DMA_ATTRS=y
> CONFIG_USE_GENERIC_SMP_HELPERS=y
> CONFIG_HAVE_DMA_API_DEBUG=y
>
> #
> # GCOV-based kernel profiling
> #
> # CONFIG_GCOV_KERNEL is not set
> # CONFIG_SLOW_WORK is not set
> # CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
> CONFIG_SLABINFO=y
> CONFIG_RT_MUTEXES=y
> CONFIG_BASE_SMALL=0
> CONFIG_MODULES=y
> # CONFIG_MODULE_FORCE_LOAD is not set
> CONFIG_MODULE_UNLOAD=y
> # CONFIG_MODULE_FORCE_UNLOAD is not set
> CONFIG_MODVERSIONS=y
> CONFIG_MODULE_SRCVERSION_ALL=y
> CONFIG_STOP_MACHINE=y
> CONFIG_BLOCK=y
> # CONFIG_BLK_DEV_BSG is not set
> # CONFIG_BLK_DEV_INTEGRITY is not set
> CONFIG_BLOCK_COMPAT=y
>
> #
> # IO Schedulers
> #
> CONFIG_IOSCHED_NOOP=y
> CONFIG_IOSCHED_AS=y
> CONFIG_IOSCHED_DEADLINE=y
> CONFIG_IOSCHED_CFQ=y
> # CONFIG_DEFAULT_AS is not set
> # CONFIG_DEFAULT_DEADLINE is not set
> CONFIG_DEFAULT_CFQ=y
> # CONFIG_DEFAULT_NOOP is not set
> CONFIG_DEFAULT_IOSCHED="cfq"
> # CONFIG_FREEZER is not set
>
> #
> # Platform support
> #
> CONFIG_PPC_PSERIES=y
> CONFIG_PPC_SPLPAR=y
> CONFIG_EEH=y
> CONFIG_SCANLOG=m
> CONFIG_LPARCFG=y
> # CONFIG_PPC_SMLPAR is not set
> # CONFIG_DTL is not set
> # CONFIG_PPC_ISERIES is not set
> CONFIG_PPC_PMAC=y
> CONFIG_PPC_PMAC64=y
> CONFIG_PPC_MAPLE=y
> # CONFIG_PPC_PASEMI is not set
> # CONFIG_PPC_PS3 is not set
> # CONFIG_PPC_CELL is not set
> # CONFIG_PPC_CELL_NATIVE is not set
> # CONFIG_PPC_IBM_CELL_BLADE is not set
> # CONFIG_PPC_CELLEB is not set
> # CONFIG_PPC_CELL_QPACE is not set
> # CONFIG_PQ2ADS is not set
> CONFIG_PPC_NATIVE=y
> CONFIG_PPC_OF_BOOT_TRAMPOLINE=y
> # CONFIG_UDBG_RTAS_CONSOLE is not set
> CONFIG_XICS=y
> # CONFIG_IPIC is not set
> CONFIG_MPIC=y
> # CONFIG_MPIC_WEIRD is not set
> CONFIG_PPC_I8259=y
> CONFIG_U3_DART=y
> CONFIG_PPC_RTAS=y
> CONFIG_RTAS_ERROR_LOGGING=y
> CONFIG_RTAS_PROC=y
> CONFIG_RTAS_FLASH=y
> CONFIG_MMIO_NVRAM=y
> CONFIG_MPIC_U3_HT_IRQS=y
> CONFIG_IBMVIO=y
> CONFIG_IBMEBUS=y
> # CONFIG_PPC_MPC106 is not set
> CONFIG_PPC_970_NAP=y
> # CONFIG_PPC_INDIRECT_IO is not set
> # CONFIG_GENERIC_IOMAP is not set
> CONFIG_CPU_FREQ=y
> CONFIG_CPU_FREQ_TABLE=y
> CONFIG_CPU_FREQ_DEBUG=y
> CONFIG_CPU_FREQ_STAT=m
> CONFIG_CPU_FREQ_STAT_DETAILS=y
> CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
> # CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
> # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
> # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
> # CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
> CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
> CONFIG_CPU_FREQ_GOV_POWERSAVE=m
> CONFIG_CPU_FREQ_GOV_USERSPACE=m
> CONFIG_CPU_FREQ_GOV_ONDEMAND=m
> CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m
>
> #
> # CPU Frequency drivers
> #
> CONFIG_CPU_FREQ_PMAC64=y
> # CONFIG_FSL_ULI1575 is not set
> # CONFIG_SIMPLE_GPIO is not set
>
> #
> # Kernel options
> #
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ=y
> # CONFIG_HIGH_RES_TIMERS is not set
> CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
> CONFIG_HZ_100=y
> # CONFIG_HZ_250 is not set
> # CONFIG_HZ_300 is not set
> # CONFIG_HZ_1000 is not set
> CONFIG_HZ=100
> # CONFIG_SCHED_HRTICK is not set
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> CONFIG_BINFMT_ELF=y
> CONFIG_COMPAT_BINFMT_ELF=y
> # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
> # CONFIG_HAVE_AOUT is not set
> CONFIG_BINFMT_MISC=m
> CONFIG_HUGETLB_PAGE_SIZE_VARIABLE=y
> CONFIG_IOMMU_VMERGE=y
> CONFIG_IOMMU_HELPER=y
> # CONFIG_SWIOTLB is not set
> CONFIG_HOTPLUG_CPU=y
> CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
> CONFIG_ARCH_HAS_WALK_MEMORY=y
> CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
> CONFIG_KEXEC=y
> # CONFIG_CRASH_DUMP is not set
> # CONFIG_PHYP_DUMP is not set
> CONFIG_IRQ_ALL_CPUS=y
> CONFIG_NUMA=y
> CONFIG_NODES_SHIFT=8
> CONFIG_MAX_ACTIVE_REGIONS=256
> CONFIG_ARCH_SELECT_MEMORY_MODEL=y
> CONFIG_ARCH_SPARSEMEM_ENABLE=y
> CONFIG_ARCH_SPARSEMEM_DEFAULT=y
> CONFIG_ARCH_POPULATES_NODE_MAP=y
> CONFIG_SELECT_MEMORY_MODEL=y
> # CONFIG_FLATMEM_MANUAL is not set
> # CONFIG_DISCONTIGMEM_MANUAL is not set
> CONFIG_SPARSEMEM_MANUAL=y
> CONFIG_SPARSEMEM=y
> CONFIG_NEED_MULTIPLE_NODES=y
> CONFIG_HAVE_MEMORY_PRESENT=y
> CONFIG_SPARSEMEM_EXTREME=y
> CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
> CONFIG_SPARSEMEM_VMEMMAP=y
> CONFIG_MEMORY_HOTPLUG=y
> CONFIG_MEMORY_HOTPLUG_SPARSE=y
> # CONFIG_MEMORY_HOTREMOVE is not set
> CONFIG_PAGEFLAGS_EXTENDED=y
> CONFIG_SPLIT_PTLOCK_CPUS=4
> CONFIG_MIGRATION=y
> CONFIG_PHYS_ADDR_T_64BIT=y
> CONFIG_ZONE_DMA_FLAG=1
> CONFIG_BOUNCE=y
> CONFIG_HAVE_MLOCK=y
> CONFIG_HAVE_MLOCKED_PAGE_BIT=y
> # CONFIG_KSM is not set
> CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
> CONFIG_ARCH_MEMORY_PROBE=y
> CONFIG_NODES_SPAN_OTHER_NODES=y
> # CONFIG_PPC_HAS_HASH_64K is not set
> CONFIG_PPC_4K_PAGES=y
> # CONFIG_PPC_16K_PAGES is not set
> # CONFIG_PPC_64K_PAGES is not set
> # CONFIG_PPC_256K_PAGES is not set
> CONFIG_FORCE_MAX_ZONEORDER=13
> CONFIG_SCHED_SMT=y
> CONFIG_PROC_DEVICETREE=y
> # CONFIG_CMDLINE_BOOL is not set
> CONFIG_EXTRA_TARGETS=""
> # CONFIG_PM is not set
> CONFIG_SECCOMP=y
> CONFIG_ISA_DMA_API=y
>
> #
> # Bus options
> #
> CONFIG_ZONE_DMA=y
> CONFIG_GENERIC_ISA_DMA=y
> # CONFIG_PPC_INDIRECT_PCI is not set
> CONFIG_PCI=y
> CONFIG_PCI_DOMAINS=y
> CONFIG_PCI_SYSCALL=y
> # CONFIG_PCIEPORTBUS is not set
> CONFIG_ARCH_SUPPORTS_MSI=y
> # CONFIG_PCI_MSI is not set
> CONFIG_PCI_LEGACY=y
> # CONFIG_PCI_DEBUG is not set
> # CONFIG_PCI_STUB is not set
> # CONFIG_PCI_IOV is not set
> # CONFIG_PCCARD is not set
> CONFIG_HOTPLUG_PCI=y
> # CONFIG_HOTPLUG_PCI_FAKE is not set
> # CONFIG_HOTPLUG_PCI_CPCI is not set
> # CONFIG_HOTPLUG_PCI_SHPC is not set
> CONFIG_HOTPLUG_PCI_RPA=y
> CONFIG_HOTPLUG_PCI_RPA_DLPAR=y
> # CONFIG_HAS_RAPIDIO is not set
> # CONFIG_RELOCATABLE is not set
> CONFIG_PAGE_OFFSET=0xc000000000000000
> CONFIG_KERNEL_START=0xc000000000000000
> CONFIG_PHYSICAL_START=0x00000000
> CONFIG_NET=y
>
> #
> # Networking options
> #
> CONFIG_PACKET=y
> CONFIG_PACKET_MMAP=y
> CONFIG_UNIX=y
> CONFIG_XFRM=y
> CONFIG_XFRM_USER=m
> # CONFIG_XFRM_SUB_POLICY is not set
> # CONFIG_XFRM_MIGRATE is not set
> # CONFIG_XFRM_STATISTICS is not set
> CONFIG_XFRM_IPCOMP=m
> CONFIG_NET_KEY=y
> # CONFIG_NET_KEY_MIGRATE is not set
> CONFIG_INET=y
> CONFIG_IP_MULTICAST=y
> CONFIG_IP_ADVANCED_ROUTER=y
> CONFIG_ASK_IP_FIB_HASH=y
> # CONFIG_IP_FIB_TRIE is not set
> CONFIG_IP_FIB_HASH=y
> CONFIG_IP_MULTIPLE_TABLES=y
> CONFIG_IP_ROUTE_MULTIPATH=y
> CONFIG_IP_ROUTE_VERBOSE=y
> # CONFIG_IP_PNP is not set
> CONFIG_NET_IPIP=m
> CONFIG_NET_IPGRE=m
> CONFIG_NET_IPGRE_BROADCAST=y
> CONFIG_IP_MROUTE=y
> CONFIG_IP_PIMSM_V1=y
> CONFIG_IP_PIMSM_V2=y
> # CONFIG_ARPD is not set
> CONFIG_SYN_COOKIES=y
> CONFIG_INET_AH=m
> CONFIG_INET_ESP=m
> CONFIG_INET_IPCOMP=m
> CONFIG_INET_XFRM_TUNNEL=m
> CONFIG_INET_TUNNEL=m
> CONFIG_INET_XFRM_MODE_TRANSPORT=y
> CONFIG_INET_XFRM_MODE_TUNNEL=y
> CONFIG_INET_XFRM_MODE_BEET=y
> # CONFIG_INET_LRO is not set
> CONFIG_INET_DIAG=m
> CONFIG_INET_TCP_DIAG=m
> CONFIG_TCP_CONG_ADVANCED=y
> CONFIG_TCP_CONG_BIC=m
> CONFIG_TCP_CONG_CUBIC=m
> CONFIG_TCP_CONG_WESTWOOD=m
> CONFIG_TCP_CONG_HTCP=m
> CONFIG_TCP_CONG_HSTCP=m
> CONFIG_TCP_CONG_HYBLA=m
> CONFIG_TCP_CONG_VEGAS=m
> CONFIG_TCP_CONG_SCALABLE=m
> # CONFIG_TCP_CONG_LP is not set
> # CONFIG_TCP_CONG_VENO is not set
> # CONFIG_TCP_CONG_YEAH is not set
> # CONFIG_TCP_CONG_ILLINOIS is not set
> # CONFIG_DEFAULT_BIC is not set
> # CONFIG_DEFAULT_CUBIC is not set
> # CONFIG_DEFAULT_HTCP is not set
> # CONFIG_DEFAULT_VEGAS is not set
> # CONFIG_DEFAULT_WESTWOOD is not set
> CONFIG_DEFAULT_RENO=y
> CONFIG_DEFAULT_TCP_CONG="reno"
> # CONFIG_TCP_MD5SIG is not set
> # CONFIG_IPV6 is not set
> # CONFIG_NETLABEL is not set
> # CONFIG_NETWORK_SECMARK is not set
> # CONFIG_NETFILTER is not set
> CONFIG_IP_DCCP=m
> CONFIG_INET_DCCP_DIAG=m
>
> #
> # DCCP CCIDs Configuration (EXPERIMENTAL)
> #
> # CONFIG_IP_DCCP_CCID2_DEBUG is not set
> CONFIG_IP_DCCP_CCID3=y
> # CONFIG_IP_DCCP_CCID3_DEBUG is not set
> CONFIG_IP_DCCP_CCID3_RTO=100
> CONFIG_IP_DCCP_TFRC_LIB=y
>
> #
> # DCCP Kernel Hacking
> #
> # CONFIG_IP_DCCP_DEBUG is not set
> # CONFIG_NET_DCCPPROBE is not set
> CONFIG_IP_SCTP=m
> # CONFIG_SCTP_DBG_MSG is not set
> # CONFIG_SCTP_DBG_OBJCNT is not set
> CONFIG_SCTP_HMAC_NONE=y
> # CONFIG_SCTP_HMAC_SHA1 is not set
> # CONFIG_SCTP_HMAC_MD5 is not set
> # CONFIG_RDS is not set
> # CONFIG_TIPC is not set
> # CONFIG_ATM is not set
> CONFIG_STP=m
> CONFIG_BRIDGE=m
> # CONFIG_NET_DSA is not set
> CONFIG_VLAN_8021Q=m
> # CONFIG_VLAN_8021Q_GVRP is not set
> # CONFIG_DECNET is not set
> CONFIG_LLC=y
> CONFIG_LLC2=m
> # CONFIG_IPX is not set
> # CONFIG_ATALK is not set
> # CONFIG_X25 is not set
> # CONFIG_LAPB is not set
> # CONFIG_ECONET is not set
> # CONFIG_WAN_ROUTER is not set
> # CONFIG_PHONET is not set
> # CONFIG_IEEE802154 is not set
> # CONFIG_NET_SCHED is not set
> # CONFIG_DCB is not set
>
> #
> # Network testing
> #
> # CONFIG_NET_PKTGEN is not set
> # CONFIG_NET_TCPPROBE is not set
> # CONFIG_NET_DROP_MONITOR is not set
> # CONFIG_HAMRADIO is not set
> # CONFIG_CAN is not set
> # CONFIG_IRDA is not set
> # CONFIG_BT is not set
> # CONFIG_AF_RXRPC is not set
> CONFIG_FIB_RULES=y
> CONFIG_WIRELESS=y
> # CONFIG_CFG80211 is not set
> CONFIG_CFG80211_DEFAULT_PS_VALUE=0
> # CONFIG_WIRELESS_OLD_REGULATORY is not set
> # CONFIG_WIRELESS_EXT is not set
> # CONFIG_LIB80211 is not set
>
> #
> # CFG80211 needs to be enabled for MAC80211
> #
> # CONFIG_WIMAX is not set
> # CONFIG_RFKILL is not set
> # CONFIG_NET_9P is not set
>
> #
> # Device Drivers
> #
>
> #
> # Generic Driver Options
> #
> CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
> # CONFIG_DEVTMPFS is not set
> CONFIG_STANDALONE=y
> CONFIG_PREVENT_FIRMWARE_BUILD=y
> CONFIG_FW_LOADER=y
> CONFIG_FIRMWARE_IN_KERNEL=y
> CONFIG_EXTRA_FIRMWARE=""
> # CONFIG_DEBUG_DRIVER is not set
> # CONFIG_DEBUG_DEVRES is not set
> # CONFIG_SYS_HYPERVISOR is not set
> CONFIG_CONNECTOR=y
> CONFIG_PROC_EVENTS=y
> # CONFIG_MTD is not set
> CONFIG_OF_DEVICE=y
> CONFIG_OF_I2C=y
> CONFIG_OF_SPI=y
> CONFIG_OF_MDIO=y
> # CONFIG_PARPORT is not set
> CONFIG_BLK_DEV=y
> CONFIG_BLK_DEV_FD=m
> # CONFIG_BLK_CPQ_CISS_DA is not set
> # CONFIG_BLK_DEV_DAC960 is not set
> # CONFIG_BLK_DEV_UMEM is not set
> # CONFIG_BLK_DEV_COW_COMMON is not set
> CONFIG_BLK_DEV_LOOP=m
> CONFIG_BLK_DEV_CRYPTOLOOP=m
> CONFIG_BLK_DEV_NBD=m
> # CONFIG_BLK_DEV_SX8 is not set
> CONFIG_BLK_DEV_RAM=y
> CONFIG_BLK_DEV_RAM_COUNT=16
> CONFIG_BLK_DEV_RAM_SIZE=123456
> # CONFIG_BLK_DEV_XIP is not set
> CONFIG_CDROM_PKTCDVD=m
> CONFIG_CDROM_PKTCDVD_BUFFERS=8
> CONFIG_CDROM_PKTCDVD_WCACHE=y
> CONFIG_ATA_OVER_ETH=m
> # CONFIG_BLK_DEV_HD is not set
> CONFIG_MISC_DEVICES=y
> # CONFIG_PHANTOM is not set
> # CONFIG_SGI_IOC4 is not set
> # CONFIG_TIFM_CORE is not set
> # CONFIG_ICS932S401 is not set
> # CONFIG_ENCLOSURE_SERVICES is not set
> # CONFIG_HP_ILO is not set
> # CONFIG_ISL29003 is not set
> # CONFIG_C2PORT is not set
>
> #
> # EEPROM support
> #
> # CONFIG_EEPROM_AT24 is not set
> # CONFIG_EEPROM_AT25 is not set
> # CONFIG_EEPROM_LEGACY is not set
> # CONFIG_EEPROM_MAX6875 is not set
> # CONFIG_EEPROM_93CX6 is not set
> # CONFIG_CB710_CORE is not set
> CONFIG_HAVE_IDE=y
> CONFIG_IDE=y
>
> #
> # Please see Documentation/ide/ide.txt for help/info on IDE drives
> #
> CONFIG_IDE_XFER_MODE=y
> CONFIG_IDE_TIMINGS=y
> CONFIG_IDE_ATAPI=y
> CONFIG_BLK_DEV_IDE_SATA=y
> CONFIG_IDE_GD=y
> CONFIG_IDE_GD_ATA=y
> # CONFIG_IDE_GD_ATAPI is not set
> CONFIG_BLK_DEV_IDECD=y
> CONFIG_BLK_DEV_IDECD_VERBOSE_ERRORS=y
> # CONFIG_BLK_DEV_IDETAPE is not set
> # CONFIG_IDE_TASK_IOCTL is not set
> CONFIG_IDE_PROC_FS=y
>
> #
> # IDE chipset support/bugfixes
> #
> # CONFIG_BLK_DEV_PLATFORM is not set
> CONFIG_BLK_DEV_IDEDMA_SFF=y
>
> #
> # PCI IDE chipsets support
> #
> CONFIG_BLK_DEV_IDEPCI=y
> CONFIG_IDEPCI_PCIBUS_ORDER=y
> # CONFIG_BLK_DEV_OFFBOARD is not set
> # CONFIG_BLK_DEV_GENERIC is not set
> # CONFIG_BLK_DEV_OPTI621 is not set
> CONFIG_BLK_DEV_IDEDMA_PCI=y
> # CONFIG_BLK_DEV_AEC62XX is not set
> # CONFIG_BLK_DEV_ALI15X3 is not set
> CONFIG_BLK_DEV_AMD74XX=y
> # CONFIG_BLK_DEV_CMD64X is not set
> # CONFIG_BLK_DEV_TRIFLEX is not set
> # CONFIG_BLK_DEV_CS5520 is not set
> # CONFIG_BLK_DEV_CS5530 is not set
> # CONFIG_BLK_DEV_HPT366 is not set
> # CONFIG_BLK_DEV_JMICRON is not set
> # CONFIG_BLK_DEV_SC1200 is not set
> # CONFIG_BLK_DEV_PIIX is not set
> # CONFIG_BLK_DEV_IT8172 is not set
> # CONFIG_BLK_DEV_IT8213 is not set
> # CONFIG_BLK_DEV_IT821X is not set
> # CONFIG_BLK_DEV_NS87415 is not set
> CONFIG_BLK_DEV_PDC202XX_OLD=y
> CONFIG_BLK_DEV_PDC202XX_NEW=y
> # CONFIG_BLK_DEV_SVWKS is not set
> # CONFIG_BLK_DEV_SIIMAGE is not set
> # CONFIG_BLK_DEV_SL82C105 is not set
> # CONFIG_BLK_DEV_SLC90E66 is not set
> # CONFIG_BLK_DEV_TRM290 is not set
> # CONFIG_BLK_DEV_VIA82CXXX is not set
> # CONFIG_BLK_DEV_TC86C001 is not set
> # CONFIG_BLK_DEV_IDE_PMAC is not set
> CONFIG_BLK_DEV_IDEDMA=y
>
> #
> # SCSI device support
> #
> CONFIG_RAID_ATTRS=m
> CONFIG_SCSI=y
> CONFIG_SCSI_DMA=y
> # CONFIG_SCSI_TGT is not set
> CONFIG_SCSI_NETLINK=y
> CONFIG_SCSI_PROC_FS=y
>
> #
> # SCSI support type (disk, tape, CD-ROM)
> #
> CONFIG_BLK_DEV_SD=y
> CONFIG_CHR_DEV_ST=y
> # CONFIG_CHR_DEV_OSST is not set
> CONFIG_BLK_DEV_SR=y
> CONFIG_BLK_DEV_SR_VENDOR=y
> CONFIG_CHR_DEV_SG=m
> CONFIG_CHR_DEV_SCH=m
> CONFIG_SCSI_MULTI_LUN=y
> CONFIG_SCSI_CONSTANTS=y
> CONFIG_SCSI_LOGGING=y
> # CONFIG_SCSI_SCAN_ASYNC is not set
> CONFIG_SCSI_WAIT_SCAN=m
>
> #
> # SCSI Transports
> #
> CONFIG_SCSI_SPI_ATTRS=m
> CONFIG_SCSI_FC_ATTRS=m
> CONFIG_SCSI_ISCSI_ATTRS=m
> CONFIG_SCSI_SAS_ATTRS=m
> CONFIG_SCSI_SAS_LIBSAS=m
> # CONFIG_SCSI_SAS_ATA is not set
> CONFIG_SCSI_SAS_HOST_SMP=y
> CONFIG_SCSI_SAS_LIBSAS_DEBUG=y
> CONFIG_SCSI_SRP_ATTRS=y
> CONFIG_SCSI_LOWLEVEL=y
> CONFIG_ISCSI_TCP=m
> # CONFIG_SCSI_CXGB3_ISCSI is not set
> # CONFIG_SCSI_BNX2_ISCSI is not set
> # CONFIG_BE2ISCSI is not set
> # CONFIG_BLK_DEV_3W_XXXX_RAID is not set
> # CONFIG_SCSI_3W_9XXX is not set
> # CONFIG_SCSI_ACARD is not set
> # CONFIG_SCSI_AACRAID is not set
> # CONFIG_SCSI_AIC7XXX is not set
> # CONFIG_SCSI_AIC7XXX_OLD is not set
> # CONFIG_SCSI_AIC79XX is not set
> CONFIG_SCSI_AIC94XX=m
> CONFIG_AIC94XX_DEBUG=y
> # CONFIG_SCSI_MVSAS is not set
> # CONFIG_SCSI_ARCMSR is not set
> # CONFIG_MEGARAID_NEWGEN is not set
> # CONFIG_MEGARAID_LEGACY is not set
> CONFIG_MEGARAID_SAS=m
> # CONFIG_SCSI_MPT2SAS is not set
> # CONFIG_SCSI_HPTIOP is not set
> # CONFIG_LIBFC is not set
> # CONFIG_LIBFCOE is not set
> # CONFIG_FCOE is not set
> # CONFIG_SCSI_DMX3191D is not set
> # CONFIG_SCSI_EATA is not set
> # CONFIG_SCSI_FUTURE_DOMAIN is not set
> # CONFIG_SCSI_GDTH is not set
> CONFIG_SCSI_IPS=y
> CONFIG_SCSI_IBMVSCSI=y
> # CONFIG_SCSI_IBMVFC is not set
> # CONFIG_SCSI_INITIO is not set
> # CONFIG_SCSI_INIA100 is not set
> # CONFIG_SCSI_STEX is not set
> CONFIG_SCSI_SYM53C8XX_2=m
> CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=0
> CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16
> CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64
> CONFIG_SCSI_SYM53C8XX_MMIO=y
> CONFIG_SCSI_IPR=y
> CONFIG_SCSI_IPR_TRACE=y
> CONFIG_SCSI_IPR_DUMP=y
> # CONFIG_SCSI_QLOGIC_1280 is not set
> CONFIG_SCSI_QLA_FC=m
> # CONFIG_SCSI_QLA_ISCSI is not set
> CONFIG_SCSI_LPFC=m
> # CONFIG_SCSI_LPFC_DEBUG_FS is not set
> # CONFIG_SCSI_DC395x is not set
> # CONFIG_SCSI_DC390T is not set
> CONFIG_SCSI_DEBUG=m
> # CONFIG_SCSI_PMCRAID is not set
> # CONFIG_SCSI_SRP is not set
> # CONFIG_SCSI_BFA_FC is not set
> # CONFIG_SCSI_DH is not set
> # CONFIG_SCSI_OSD_INITIATOR is not set
> CONFIG_ATA=y
> CONFIG_ATA_NONSTANDARD=y
> CONFIG_ATA_VERBOSE_ERROR=y
> CONFIG_SATA_PMP=y
> # CONFIG_SATA_AHCI is not set
> # CONFIG_SATA_SIL24 is not set
> CONFIG_ATA_SFF=y
> # CONFIG_SATA_SVW is not set
> # CONFIG_ATA_PIIX is not set
> # CONFIG_SATA_MV is not set
> # CONFIG_SATA_NV is not set
> # CONFIG_PDC_ADMA is not set
> # CONFIG_SATA_QSTOR is not set
> CONFIG_SATA_PROMISE=y
> CONFIG_SATA_SX4=y
> # CONFIG_SATA_SIL is not set
> # CONFIG_SATA_SIS is not set
> # CONFIG_SATA_ULI is not set
> # CONFIG_SATA_VIA is not set
> # CONFIG_SATA_VITESSE is not set
> # CONFIG_SATA_INIC162X is not set
> # CONFIG_PATA_ALI is not set
> # CONFIG_PATA_AMD is not set
> # CONFIG_PATA_ARTOP is not set
> # CONFIG_PATA_ATP867X is not set
> # CONFIG_PATA_ATIIXP is not set
> # CONFIG_PATA_CMD640_PCI is not set
> # CONFIG_PATA_CMD64X is not set
> # CONFIG_PATA_CS5520 is not set
> # CONFIG_PATA_CS5530 is not set
> # CONFIG_PATA_CYPRESS is not set
> # CONFIG_PATA_EFAR is not set
> CONFIG_ATA_GENERIC=y
> CONFIG_PATA_HPT366=y
> # CONFIG_PATA_HPT37X is not set
> # CONFIG_PATA_HPT3X2N is not set
> # CONFIG_PATA_HPT3X3 is not set
> # CONFIG_PATA_IT821X is not set
> # CONFIG_PATA_IT8213 is not set
> # CONFIG_PATA_JMICRON is not set
> # CONFIG_PATA_TRIFLEX is not set
> # CONFIG_PATA_MARVELL is not set
> # CONFIG_PATA_MPIIX is not set
> # CONFIG_PATA_OLDPIIX is not set
> # CONFIG_PATA_NETCELL is not set
> # CONFIG_PATA_NINJA32 is not set
> # CONFIG_PATA_NS87410 is not set
> # CONFIG_PATA_NS87415 is not set
> # CONFIG_PATA_OPTI is not set
> # CONFIG_PATA_OPTIDMA is not set
> CONFIG_PATA_PDC_OLD=y
> # CONFIG_PATA_RADISYS is not set
> # CONFIG_PATA_RDC is not set
> # CONFIG_PATA_RZ1000 is not set
> # CONFIG_PATA_SC1200 is not set
> # CONFIG_PATA_SERVERWORKS is not set
> CONFIG_PATA_PDC2027X=y
> # CONFIG_PATA_SIL680 is not set
> # CONFIG_PATA_SIS is not set
> # CONFIG_PATA_VIA is not set
> # CONFIG_PATA_WINBOND is not set
> # CONFIG_PATA_PLATFORM is not set
> # CONFIG_PATA_SCH is not set
> CONFIG_MD=y
> CONFIG_BLK_DEV_MD=y
> CONFIG_MD_AUTODETECT=y
> CONFIG_MD_LINEAR=m
> CONFIG_MD_RAID0=m
> CONFIG_MD_RAID1=m
> CONFIG_MD_RAID10=m
> # CONFIG_MD_RAID456 is not set
> CONFIG_MD_MULTIPATH=m
> CONFIG_MD_FAULTY=m
> CONFIG_BLK_DEV_DM=m
> # CONFIG_DM_DEBUG is not set
> CONFIG_DM_CRYPT=m
> CONFIG_DM_SNAPSHOT=m
> CONFIG_DM_MIRROR=m
> # CONFIG_DM_LOG_USERSPACE is not set
> CONFIG_DM_ZERO=m
> CONFIG_DM_MULTIPATH=m
> # CONFIG_DM_MULTIPATH_QL is not set
> # CONFIG_DM_MULTIPATH_ST is not set
> # CONFIG_DM_DELAY is not set
> # CONFIG_DM_UEVENT is not set
> # CONFIG_FUSION is not set
>
> #
> # IEEE 1394 (FireWire) support
> #
>
> #
> # You can enable one or both FireWire driver stacks.
> #
>
> #
> # See the help texts for more information.
> #
> # CONFIG_FIREWIRE is not set
> # CONFIG_IEEE1394 is not set
> # CONFIG_I2O is not set
> # CONFIG_MACINTOSH_DRIVERS is not set
> CONFIG_NETDEVICES=y
> CONFIG_DUMMY=m
> CONFIG_BONDING=m
> # CONFIG_MACVLAN is not set
> CONFIG_EQUALIZER=m
> CONFIG_TUN=m
> CONFIG_VETH=y
> # CONFIG_ARCNET is not set
> CONFIG_PHYLIB=y
>
> #
> # MII PHY device drivers
> #
> CONFIG_MARVELL_PHY=m
> CONFIG_DAVICOM_PHY=m
> CONFIG_QSEMI_PHY=m
> CONFIG_LXT_PHY=m
> CONFIG_CICADA_PHY=m
> # CONFIG_VITESSE_PHY is not set
> # CONFIG_SMSC_PHY is not set
> # CONFIG_BROADCOM_PHY is not set
> # CONFIG_ICPLUS_PHY is not set
> # CONFIG_REALTEK_PHY is not set
> # CONFIG_NATIONAL_PHY is not set
> # CONFIG_STE10XP is not set
> # CONFIG_LSI_ET1011C_PHY is not set
> # CONFIG_FIXED_PHY is not set
> # CONFIG_MDIO_BITBANG is not set
> CONFIG_NET_ETHERNET=y
> CONFIG_MII=m
> # CONFIG_HAPPYMEAL is not set
> CONFIG_SUNGEM=m
> CONFIG_CASSINI=m
> CONFIG_NET_VENDOR_3COM=y
> CONFIG_VORTEX=m
> CONFIG_TYPHOON=m
> # CONFIG_ENC28J60 is not set
> # CONFIG_ETHOC is not set
> # CONFIG_DNET is not set
> CONFIG_NET_TULIP=y
> # CONFIG_DE2104X is not set
> CONFIG_TULIP=m
> CONFIG_TULIP_MWI=y
> CONFIG_TULIP_MMIO=y
> CONFIG_TULIP_NAPI=y
> CONFIG_TULIP_NAPI_HW_MITIGATION=y
> CONFIG_DE4X5=m
> CONFIG_WINBOND_840=m
> CONFIG_DM9102=m
> CONFIG_ULI526X=m
> # CONFIG_HP100 is not set
> CONFIG_IBMVETH=y
> # CONFIG_IBM_NEW_EMAC_ZMII is not set
> # CONFIG_IBM_NEW_EMAC_RGMII is not set
> # CONFIG_IBM_NEW_EMAC_TAH is not set
> # CONFIG_IBM_NEW_EMAC_EMAC4 is not set
> # CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set
> # CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set
> # CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set
> CONFIG_NET_PCI=y
> CONFIG_PCNET32=m
> CONFIG_AMD8111_ETH=m
> # CONFIG_ADAPTEC_STARFIRE is not set
> # CONFIG_B44 is not set
> # CONFIG_FORCEDETH is not set
> CONFIG_E100=m
> # CONFIG_FEALNX is not set
> # CONFIG_NATSEMI is not set
> # CONFIG_NE2K_PCI is not set
> # CONFIG_8139CP is not set
> # CONFIG_8139TOO is not set
> # CONFIG_R6040 is not set
> # CONFIG_SIS900 is not set
> # CONFIG_EPIC100 is not set
> # CONFIG_SMSC9420 is not set
> # CONFIG_SUNDANCE is not set
> # CONFIG_TLAN is not set
> # CONFIG_KS8842 is not set
> # CONFIG_KS8851 is not set
> # CONFIG_KS8851_MLL is not set
> # CONFIG_VIA_RHINE is not set
> # CONFIG_SC92031 is not set
> # CONFIG_ATL2 is not set
> CONFIG_NETDEV_1000=y
> CONFIG_ACENIC=m
> CONFIG_ACENIC_OMIT_TIGON_I=y
> # CONFIG_DL2K is not set
> CONFIG_E1000=m
> # CONFIG_E1000E is not set
> # CONFIG_IP1000 is not set
> # CONFIG_IGB is not set
> # CONFIG_IGBVF is not set
> # CONFIG_NS83820 is not set
> # CONFIG_HAMACHI is not set
> # CONFIG_YELLOWFIN is not set
> CONFIG_R8169=m
> CONFIG_R8169_VLAN=y
> CONFIG_SIS190=m
> # CONFIG_SKGE is not set
> CONFIG_SKY2=m
> # CONFIG_SKY2_DEBUG is not set
> # CONFIG_VIA_VELOCITY is not set
> CONFIG_TIGON3=y
> # CONFIG_BNX2 is not set
> # CONFIG_CNIC is not set
> # CONFIG_QLA3XXX is not set
> # CONFIG_ATL1 is not set
> # CONFIG_ATL1E is not set
> # CONFIG_ATL1C is not set
> # CONFIG_JME is not set
> CONFIG_NETDEV_10000=y
> CONFIG_MDIO=m
> CONFIG_CHELSIO_T1=m
> # CONFIG_CHELSIO_T1_1G is not set
> CONFIG_CHELSIO_T3_DEPENDS=y
> # CONFIG_CHELSIO_T3 is not set
> # CONFIG_EHEA is not set
> # CONFIG_ENIC is not set
> # CONFIG_IXGBE is not set
> CONFIG_IXGB=m
> CONFIG_S2IO=m
> # CONFIG_VXGE is not set
> # CONFIG_MYRI10GE is not set
> # CONFIG_NETXEN_NIC is not set
> # CONFIG_NIU is not set
> # CONFIG_MLX4_EN is not set
> # CONFIG_MLX4_CORE is not set
> # CONFIG_TEHUTI is not set
> # CONFIG_BNX2X is not set
> # CONFIG_QLGE is not set
> # CONFIG_SFC is not set
> # CONFIG_BE2NET is not set
> CONFIG_TR=y
> CONFIG_IBMOL=m
> # CONFIG_3C359 is not set
> # CONFIG_TMS380TR is not set
> CONFIG_WLAN=y
> # CONFIG_WLAN_PRE80211 is not set
> # CONFIG_WLAN_80211 is not set
>
> #
> # Enable WiMAX (Networking options) to see the WiMAX drivers
> #
> # CONFIG_WAN is not set
> # CONFIG_FDDI is not set
> # CONFIG_HIPPI is not set
> CONFIG_PPP=m
> CONFIG_PPP_MULTILINK=y
> CONFIG_PPP_FILTER=y
> CONFIG_PPP_ASYNC=m
> CONFIG_PPP_SYNC_TTY=m
> CONFIG_PPP_DEFLATE=m
> CONFIG_PPP_BSDCOMP=m
> CONFIG_PPP_MPPE=m
> CONFIG_PPPOE=m
> # CONFIG_PPPOL2TP is not set
> CONFIG_SLIP=m
> CONFIG_SLIP_COMPRESSED=y
> CONFIG_SLHC=m
> CONFIG_SLIP_SMART=y
> # CONFIG_SLIP_MODE_SLIP6 is not set
> CONFIG_NET_FC=y
> CONFIG_NETCONSOLE=m
> # CONFIG_NETCONSOLE_DYNAMIC is not set
> CONFIG_NETPOLL=y
> CONFIG_NETPOLL_TRAP=y
> CONFIG_NET_POLL_CONTROLLER=y
> # CONFIG_ISDN is not set
> # CONFIG_PHONE is not set
>
> #
> # Input device support
> #
> CONFIG_INPUT=y
> # CONFIG_INPUT_FF_MEMLESS is not set
> # CONFIG_INPUT_POLLDEV is not set
>
> #
> # Userland interfaces
> #
> CONFIG_INPUT_MOUSEDEV=y
> # CONFIG_INPUT_MOUSEDEV_PSAUX is not set
> CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
> CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
> CONFIG_INPUT_JOYDEV=m
> CONFIG_INPUT_EVDEV=y
> CONFIG_INPUT_EVBUG=m
>
> #
> # Input Device Drivers
> #
> CONFIG_INPUT_KEYBOARD=y
> # CONFIG_KEYBOARD_ADP5588 is not set
> CONFIG_KEYBOARD_ATKBD=y
> # CONFIG_QT2160 is not set
> # CONFIG_KEYBOARD_LKKBD is not set
> # CONFIG_KEYBOARD_MAX7359 is not set
> # CONFIG_KEYBOARD_NEWTON is not set
> # CONFIG_KEYBOARD_OPENCORES is not set
> # CONFIG_KEYBOARD_STOWAWAY is not set
> # CONFIG_KEYBOARD_SUNKBD is not set
> # CONFIG_KEYBOARD_XTKBD is not set
> CONFIG_INPUT_MOUSE=y
> CONFIG_MOUSE_PS2=y
> CONFIG_MOUSE_PS2_ALPS=y
> CONFIG_MOUSE_PS2_LOGIPS2PP=y
> CONFIG_MOUSE_PS2_SYNAPTICS=y
> CONFIG_MOUSE_PS2_TRACKPOINT=y
> # CONFIG_MOUSE_PS2_ELANTECH is not set
> # CONFIG_MOUSE_PS2_SENTELIC is not set
> # CONFIG_MOUSE_PS2_TOUCHKIT is not set
> CONFIG_MOUSE_SERIAL=m
> # CONFIG_MOUSE_VSXXXAA is not set
> # CONFIG_MOUSE_SYNAPTICS_I2C is not set
> CONFIG_INPUT_JOYSTICK=y
> # CONFIG_JOYSTICK_ANALOG is not set
> # CONFIG_JOYSTICK_A3D is not set
> # CONFIG_JOYSTICK_ADI is not set
> # CONFIG_JOYSTICK_COBRA is not set
> # CONFIG_JOYSTICK_GF2K is not set
> # CONFIG_JOYSTICK_GRIP is not set
> # CONFIG_JOYSTICK_GRIP_MP is not set
> # CONFIG_JOYSTICK_GUILLEMOT is not set
> # CONFIG_JOYSTICK_INTERACT is not set
> # CONFIG_JOYSTICK_SIDEWINDER is not set
> # CONFIG_JOYSTICK_TMDC is not set
> CONFIG_JOYSTICK_IFORCE=m
> CONFIG_JOYSTICK_IFORCE_232=y
> CONFIG_JOYSTICK_WARRIOR=m
> CONFIG_JOYSTICK_MAGELLAN=m
> CONFIG_JOYSTICK_SPACEORB=m
> CONFIG_JOYSTICK_SPACEBALL=m
> CONFIG_JOYSTICK_STINGER=m
> CONFIG_JOYSTICK_TWIDJOY=m
> # CONFIG_JOYSTICK_ZHENHUA is not set
> CONFIG_JOYSTICK_JOYDUMP=m
> # CONFIG_INPUT_TABLET is not set
> CONFIG_INPUT_TOUCHSCREEN=y
> CONFIG_TOUCHSCREEN_ADS7846=m
> # CONFIG_TOUCHSCREEN_AD7877 is not set
> # CONFIG_TOUCHSCREEN_AD7879_I2C is not set
> # CONFIG_TOUCHSCREEN_AD7879_SPI is not set
> # CONFIG_TOUCHSCREEN_AD7879 is not set
> # CONFIG_TOUCHSCREEN_EETI is not set
> # CONFIG_TOUCHSCREEN_FUJITSU is not set
> # CONFIG_TOUCHSCREEN_GUNZE is not set
> # CONFIG_TOUCHSCREEN_ELO is not set
> # CONFIG_TOUCHSCREEN_WACOM_W8001 is not set
> # CONFIG_TOUCHSCREEN_MCS5000 is not set
> # CONFIG_TOUCHSCREEN_MTOUCH is not set
> # CONFIG_TOUCHSCREEN_INEXIO is not set
> # CONFIG_TOUCHSCREEN_MK712 is not set
> # CONFIG_TOUCHSCREEN_PENMOUNT is not set
> # CONFIG_TOUCHSCREEN_TOUCHRIGHT is not set
> # CONFIG_TOUCHSCREEN_TOUCHWIN is not set
> # CONFIG_TOUCHSCREEN_TOUCHIT213 is not set
> # CONFIG_TOUCHSCREEN_TSC2007 is not set
> CONFIG_INPUT_MISC=y
> CONFIG_INPUT_PCSPKR=m
> CONFIG_INPUT_UINPUT=m
>
> #
> # Hardware I/O ports
> #
> CONFIG_SERIO=y
> CONFIG_SERIO_I8042=y
> CONFIG_SERIO_SERPORT=m
> # CONFIG_SERIO_PCIPS2 is not set
> CONFIG_SERIO_LIBPS2=y
> CONFIG_SERIO_RAW=m
> # CONFIG_SERIO_XILINX_XPS_PS2 is not set
> CONFIG_GAMEPORT=m
> # CONFIG_GAMEPORT_NS558 is not set
> # CONFIG_GAMEPORT_L4 is not set
> # CONFIG_GAMEPORT_EMU10K1 is not set
> # CONFIG_GAMEPORT_FM801 is not set
>
> #
> # Character devices
> #
> CONFIG_VT=y
> CONFIG_CONSOLE_TRANSLATIONS=y
> CONFIG_VT_CONSOLE=y
> CONFIG_HW_CONSOLE=y
> # CONFIG_VT_HW_CONSOLE_BINDING is not set
> CONFIG_DEVKMEM=y
> # CONFIG_SERIAL_NONSTANDARD is not set
> # CONFIG_NOZOMI is not set
>
> #
> # Serial drivers
> #
> CONFIG_SERIAL_8250=y
> CONFIG_SERIAL_8250_CONSOLE=y
> CONFIG_SERIAL_8250_PCI=y
> CONFIG_SERIAL_8250_NR_UARTS=4
> CONFIG_SERIAL_8250_RUNTIME_UARTS=4
> # CONFIG_SERIAL_8250_EXTENDED is not set
>
> #
> # Non-8250 serial port support
> #
> # CONFIG_SERIAL_MAX3100 is not set
> CONFIG_SERIAL_CORE=y
> CONFIG_SERIAL_CORE_CONSOLE=y
> CONFIG_SERIAL_PMACZILOG=y
> # CONFIG_SERIAL_PMACZILOG_TTYS is not set
> CONFIG_SERIAL_PMACZILOG_CONSOLE=y
> CONFIG_SERIAL_ICOM=m
> CONFIG_SERIAL_JSM=m
> # CONFIG_SERIAL_OF_PLATFORM is not set
> CONFIG_UNIX98_PTYS=y
> # CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
> CONFIG_LEGACY_PTYS=y
> CONFIG_LEGACY_PTY_COUNT=64
> CONFIG_HVC_DRIVER=y
> CONFIG_HVC_IRQ=y
> CONFIG_HVC_CONSOLE=y
> CONFIG_HVC_RTAS=y
> # CONFIG_HVC_UDBG is not set
> CONFIG_HVCS=m
> # CONFIG_IBM_BSR is not set
> # CONFIG_IPMI_HANDLER is not set
> CONFIG_HW_RANDOM=m
> # CONFIG_HW_RANDOM_TIMERIOMEM is not set
> CONFIG_GEN_RTC=y
> # CONFIG_GEN_RTC_X is not set
> # CONFIG_R3964 is not set
> # CONFIG_APPLICOM is not set
> CONFIG_RAW_DRIVER=m
> CONFIG_MAX_RAW_DEVS=4096
> # CONFIG_HANGCHECK_TIMER is not set
> CONFIG_TCG_TPM=m
> # CONFIG_TCG_NSC is not set
> CONFIG_TCG_ATMEL=m
> CONFIG_DEVPORT=y
> CONFIG_I2C=y
> CONFIG_I2C_BOARDINFO=y
> CONFIG_I2C_COMPAT=y
> CONFIG_I2C_CHARDEV=m
> CONFIG_I2C_HELPER_AUTO=y
>
> #
> # I2C Hardware Bus support
> #
>
> #
> # PC SMBus host controller drivers
> #
> # CONFIG_I2C_ALI1535 is not set
> # CONFIG_I2C_ALI1563 is not set
> # CONFIG_I2C_ALI15X3 is not set
> # CONFIG_I2C_AMD756 is not set
> CONFIG_I2C_AMD8111=m
> # CONFIG_I2C_I801 is not set
> # CONFIG_I2C_ISCH is not set
> # CONFIG_I2C_PIIX4 is not set
> # CONFIG_I2C_NFORCE2 is not set
> # CONFIG_I2C_SIS5595 is not set
> # CONFIG_I2C_SIS630 is not set
> # CONFIG_I2C_SIS96X is not set
> # CONFIG_I2C_VIA is not set
> # CONFIG_I2C_VIAPRO is not set
>
> #
> # Mac SMBus host controller drivers
> #
> CONFIG_I2C_POWERMAC=y
>
> #
> # I2C system bus drivers (mostly embedded / system-on-chip)
> #
> # CONFIG_I2C_OCORES is not set
> # CONFIG_I2C_SIMTEC is not set
>
> #
> # External I2C/SMBus adapter drivers
> #
> # CONFIG_I2C_PARPORT_LIGHT is not set
> # CONFIG_I2C_TAOS_EVM is not set
>
> #
> # Graphics adapter I2C/DDC channel drivers
> #
> # CONFIG_I2C_VOODOO3 is not set
>
> #
> # Other I2C/SMBus bus drivers
> #
> # CONFIG_I2C_PCA_PLATFORM is not set
> # CONFIG_I2C_STUB is not set
>
> #
> # Miscellaneous I2C Chip support
> #
> # CONFIG_DS1682 is not set
> # CONFIG_SENSORS_TSL2550 is not set
> # CONFIG_I2C_DEBUG_CORE is not set
> # CONFIG_I2C_DEBUG_ALGO is not set
> # CONFIG_I2C_DEBUG_BUS is not set
> # CONFIG_I2C_DEBUG_CHIP is not set
> CONFIG_SPI=y
> CONFIG_SPI_DEBUG=y
> CONFIG_SPI_MASTER=y
>
> #
> # SPI Master Controller Drivers
> #
> CONFIG_SPI_BITBANG=m
>
> #
> # SPI Protocol Masters
> #
> # CONFIG_SPI_SPIDEV is not set
> # CONFIG_SPI_TLE62X0 is not set
>
> #
> # PPS support
> #
> # CONFIG_PPS is not set
> CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
> # CONFIG_GPIOLIB is not set
> # CONFIG_W1 is not set
> # CONFIG_POWER_SUPPLY is not set
> # CONFIG_HWMON is not set
> # CONFIG_THERMAL is not set
> CONFIG_WATCHDOG=y
> # CONFIG_WATCHDOG_NOWAYOUT is not set
>
> #
> # Watchdog Device Drivers
> #
> CONFIG_SOFT_WATCHDOG=m
> # CONFIG_ALIM7101_WDT is not set
> CONFIG_WATCHDOG_RTAS=m
>
> #
> # PCI-based Watchdog Cards
> #
> # CONFIG_PCIPCWATCHDOG is not set
> # CONFIG_WDTPCI is not set
> CONFIG_SSB_POSSIBLE=y
>
> #
> # Sonics Silicon Backplane
> #
> # CONFIG_SSB is not set
>
> #
> # Multifunction device drivers
> #
> # CONFIG_MFD_CORE is not set
> # CONFIG_MFD_SM501 is not set
> # CONFIG_HTC_PASIC3 is not set
> # CONFIG_TWL4030_CORE is not set
> # CONFIG_MFD_TMIO is not set
> # CONFIG_PMIC_DA903X is not set
> # CONFIG_MFD_WM8400 is not set
> # CONFIG_MFD_WM831X is not set
> # CONFIG_MFD_WM8350_I2C is not set
> # CONFIG_MFD_PCF50633 is not set
> # CONFIG_MFD_MC13783 is not set
> # CONFIG_AB3100_CORE is not set
> # CONFIG_EZX_PCAP is not set
> # CONFIG_REGULATOR is not set
> # CONFIG_MEDIA_SUPPORT is not set
>
> #
> # Graphics support
> #
> # CONFIG_AGP is not set
> CONFIG_VGA_ARB=y
> # CONFIG_DRM is not set
> # CONFIG_VGASTATE is not set
> # CONFIG_VIDEO_OUTPUT_CONTROL is not set
> # CONFIG_FB is not set
> # CONFIG_BACKLIGHT_LCD_SUPPORT is not set
>
> #
> # Display device support
> #
> # CONFIG_DISPLAY_SUPPORT is not set
>
> #
> # Console display driver support
> #
> # CONFIG_VGA_CONSOLE is not set
> CONFIG_DUMMY_CONSOLE=y
> # CONFIG_SOUND is not set
> # CONFIG_HID_SUPPORT is not set
> # CONFIG_USB_SUPPORT is not set
> # CONFIG_UWB is not set
> # CONFIG_MMC is not set
> # CONFIG_MEMSTICK is not set
> # CONFIG_NEW_LEDS is not set
> # CONFIG_ACCESSIBILITY is not set
> # CONFIG_INFINIBAND is not set
> # CONFIG_EDAC is not set
> # CONFIG_RTC_CLASS is not set
> # CONFIG_DMADEVICES is not set
> # CONFIG_AUXDISPLAY is not set
> # CONFIG_UIO is not set
>
> #
> # TI VLYNQ
> #
> # CONFIG_STAGING is not set
>
> #
> # File systems
> #
> CONFIG_EXT2_FS=y
> CONFIG_EXT2_FS_XATTR=y
> CONFIG_EXT2_FS_POSIX_ACL=y
> CONFIG_EXT2_FS_SECURITY=y
> # CONFIG_EXT2_FS_XIP is not set
> CONFIG_EXT3_FS=y
> # CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
> CONFIG_EXT3_FS_XATTR=y
> CONFIG_EXT3_FS_POSIX_ACL=y
> CONFIG_EXT3_FS_SECURITY=y
> # CONFIG_EXT4_FS is not set
> CONFIG_JBD=y
> # CONFIG_JBD_DEBUG is not set
> CONFIG_JBD2=m
> # CONFIG_JBD2_DEBUG is not set
> CONFIG_FS_MBCACHE=y
> CONFIG_REISERFS_FS=y
> # CONFIG_REISERFS_CHECK is not set
> CONFIG_REISERFS_PROC_INFO=y
> CONFIG_REISERFS_FS_XATTR=y
> CONFIG_REISERFS_FS_POSIX_ACL=y
> CONFIG_REISERFS_FS_SECURITY=y
> CONFIG_JFS_FS=m
> CONFIG_JFS_POSIX_ACL=y
> CONFIG_JFS_SECURITY=y
> # CONFIG_JFS_DEBUG is not set
> CONFIG_JFS_STATISTICS=y
> CONFIG_FS_POSIX_ACL=y
> CONFIG_XFS_FS=m
> # CONFIG_XFS_QUOTA is not set
> CONFIG_XFS_POSIX_ACL=y
> CONFIG_XFS_RT=y
> # CONFIG_XFS_DEBUG is not set
> # CONFIG_GFS2_FS is not set
> CONFIG_OCFS2_FS=m
> CONFIG_OCFS2_FS_O2CB=m
> CONFIG_OCFS2_FS_STATS=y
> CONFIG_OCFS2_DEBUG_MASKLOG=y
> # CONFIG_OCFS2_DEBUG_FS is not set
> # CONFIG_OCFS2_FS_POSIX_ACL is not set
> # CONFIG_BTRFS_FS is not set
> # CONFIG_NILFS2_FS is not set
> CONFIG_FILE_LOCKING=y
> CONFIG_FSNOTIFY=y
> CONFIG_DNOTIFY=y
> CONFIG_INOTIFY=y
> CONFIG_INOTIFY_USER=y
> CONFIG_QUOTA=y
> # CONFIG_QUOTA_NETLINK_INTERFACE is not set
> CONFIG_PRINT_QUOTA_WARNING=y
> CONFIG_QUOTA_TREE=m
> CONFIG_QFMT_V1=m
> CONFIG_QFMT_V2=m
> CONFIG_QUOTACTL=y
> CONFIG_AUTOFS_FS=m
> CONFIG_AUTOFS4_FS=m
> CONFIG_FUSE_FS=m
> # CONFIG_CUSE is not set
> CONFIG_GENERIC_ACL=y
>
> #
> # Caches
> #
> # CONFIG_FSCACHE is not set
>
> #
> # CD-ROM/DVD Filesystems
> #
> # CONFIG_ISO9660_FS is not set
> # CONFIG_UDF_FS is not set
>
> #
> # DOS/FAT/NT Filesystems
> #
> # CONFIG_MSDOS_FS is not set
> # CONFIG_VFAT_FS is not set
> # CONFIG_NTFS_FS is not set
>
> #
> # Pseudo filesystems
> #
> CONFIG_PROC_FS=y
> CONFIG_PROC_KCORE=y
> CONFIG_PROC_SYSCTL=y
> CONFIG_PROC_PAGE_MONITOR=y
> CONFIG_SYSFS=y
> CONFIG_TMPFS=y
> CONFIG_TMPFS_POSIX_ACL=y
> CONFIG_HUGETLBFS=y
> CONFIG_HUGETLB_PAGE=y
> CONFIG_CONFIGFS_FS=m
> CONFIG_MISC_FILESYSTEMS=y
> # CONFIG_ADFS_FS is not set
> # CONFIG_AFFS_FS is not set
> CONFIG_HFS_FS=m
> CONFIG_HFSPLUS_FS=m
> # CONFIG_BEFS_FS is not set
> # CONFIG_BFS_FS is not set
> # CONFIG_EFS_FS is not set
> CONFIG_CRAMFS=y
> # CONFIG_SQUASHFS is not set
> # CONFIG_VXFS_FS is not set
> # CONFIG_MINIX_FS is not set
> # CONFIG_OMFS_FS is not set
> # CONFIG_HPFS_FS is not set
> # CONFIG_QNX4FS_FS is not set
> CONFIG_ROMFS_FS=m
> CONFIG_ROMFS_BACKED_BY_BLOCK=y
> # CONFIG_ROMFS_BACKED_BY_MTD is not set
> # CONFIG_ROMFS_BACKED_BY_BOTH is not set
> CONFIG_ROMFS_ON_BLOCK=y
> # CONFIG_SYSV_FS is not set
> CONFIG_UFS_FS=m
> # CONFIG_UFS_FS_WRITE is not set
> # CONFIG_UFS_DEBUG is not set
> # CONFIG_NETWORK_FILESYSTEMS is not set
> CONFIG_EXPORTFS=m
>
> #
> # Partition Types
> #
> CONFIG_PARTITION_ADVANCED=y
> # CONFIG_ACORN_PARTITION is not set
> CONFIG_OSF_PARTITION=y
> CONFIG_AMIGA_PARTITION=y
> CONFIG_ATARI_PARTITION=y
> CONFIG_MAC_PARTITION=y
> CONFIG_MSDOS_PARTITION=y
> CONFIG_BSD_DISKLABEL=y
> CONFIG_MINIX_SUBPARTITION=y
> CONFIG_SOLARIS_X86_PARTITION=y
> CONFIG_UNIXWARE_DISKLABEL=y
> CONFIG_LDM_PARTITION=y
> # CONFIG_LDM_DEBUG is not set
> CONFIG_SGI_PARTITION=y
> CONFIG_ULTRIX_PARTITION=y
> CONFIG_SUN_PARTITION=y
> CONFIG_KARMA_PARTITION=y
> CONFIG_EFI_PARTITION=y
> # CONFIG_SYSV68_PARTITION is not set
> CONFIG_NLS=y
> CONFIG_NLS_DEFAULT="utf8"
> CONFIG_NLS_CODEPAGE_437=y
> CONFIG_NLS_CODEPAGE_737=m
> CONFIG_NLS_CODEPAGE_775=m
> CONFIG_NLS_CODEPAGE_850=m
> CONFIG_NLS_CODEPAGE_852=m
> CONFIG_NLS_CODEPAGE_855=m
> CONFIG_NLS_CODEPAGE_857=m
> CONFIG_NLS_CODEPAGE_860=m
> CONFIG_NLS_CODEPAGE_861=m
> CONFIG_NLS_CODEPAGE_862=m
> CONFIG_NLS_CODEPAGE_863=m
> CONFIG_NLS_CODEPAGE_864=m
> CONFIG_NLS_CODEPAGE_865=m
> CONFIG_NLS_CODEPAGE_866=m
> CONFIG_NLS_CODEPAGE_869=m
> CONFIG_NLS_CODEPAGE_936=m
> CONFIG_NLS_CODEPAGE_950=m
> CONFIG_NLS_CODEPAGE_932=m
> CONFIG_NLS_CODEPAGE_949=m
> CONFIG_NLS_CODEPAGE_874=m
> CONFIG_NLS_ISO8859_8=m
> CONFIG_NLS_CODEPAGE_1250=m
> CONFIG_NLS_CODEPAGE_1251=m
> CONFIG_NLS_ASCII=m
> CONFIG_NLS_ISO8859_1=y
> CONFIG_NLS_ISO8859_2=m
> CONFIG_NLS_ISO8859_3=m
> CONFIG_NLS_ISO8859_4=m
> CONFIG_NLS_ISO8859_5=m
> CONFIG_NLS_ISO8859_6=m
> CONFIG_NLS_ISO8859_7=m
> CONFIG_NLS_ISO8859_9=m
> CONFIG_NLS_ISO8859_13=m
> CONFIG_NLS_ISO8859_14=m
> CONFIG_NLS_ISO8859_15=m
> CONFIG_NLS_KOI8_R=m
> CONFIG_NLS_KOI8_U=m
> CONFIG_NLS_UTF8=m
> # CONFIG_DLM is not set
> CONFIG_BINARY_PRINTF=y
>
> #
> # Library routines
> #
> CONFIG_BITREVERSE=y
> CONFIG_GENERIC_FIND_LAST_BIT=y
> CONFIG_CRC_CCITT=m
> CONFIG_CRC16=m
> CONFIG_CRC_T10DIF=m
> # CONFIG_CRC_ITU_T is not set
> CONFIG_CRC32=y
> # CONFIG_CRC7 is not set
> CONFIG_LIBCRC32C=m
> CONFIG_ZLIB_INFLATE=y
> CONFIG_ZLIB_DEFLATE=m
> CONFIG_DECOMPRESS_GZIP=y
> CONFIG_DECOMPRESS_BZIP2=y
> CONFIG_DECOMPRESS_LZMA=y
> CONFIG_HAS_IOMEM=y
> CONFIG_HAS_IOPORT=y
> CONFIG_HAS_DMA=y
> CONFIG_HAVE_LMB=y
> CONFIG_NLATTR=y
>
> #
> # Kernel hacking
> #
> # CONFIG_PRINTK_TIME is not set
> CONFIG_ENABLE_WARN_DEPRECATED=y
> CONFIG_ENABLE_MUST_CHECK=y
> CONFIG_FRAME_WARN=2048
> CONFIG_MAGIC_SYSRQ=y
> # CONFIG_STRIP_ASM_SYMS is not set
> # CONFIG_UNUSED_SYMBOLS is not set
> CONFIG_DEBUG_FS=y
> # CONFIG_HEADERS_CHECK is not set
> CONFIG_DEBUG_KERNEL=y
> # CONFIG_DEBUG_SHIRQ is not set
> # CONFIG_DETECT_SOFTLOCKUP is not set
> # CONFIG_DETECT_HUNG_TASK is not set
> CONFIG_SCHED_DEBUG=y
> CONFIG_SCHEDSTATS=y
> CONFIG_TIMER_STATS=y
> # CONFIG_DEBUG_OBJECTS is not set
> # CONFIG_DEBUG_SLAB is not set
> CONFIG_DEBUG_PREEMPT=y
> # CONFIG_DEBUG_RT_MUTEXES is not set
> # CONFIG_RT_MUTEX_TESTER is not set
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_LOCK_ALLOC=y
> CONFIG_PROVE_LOCKING=y
> CONFIG_LOCKDEP=y
> # CONFIG_LOCK_STAT is not set
> # CONFIG_DEBUG_LOCKDEP is not set
> CONFIG_TRACE_IRQFLAGS=y
> # CONFIG_DEBUG_SPINLOCK_SLEEP is not set
> CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
> CONFIG_STACKTRACE=y
> # CONFIG_DEBUG_KOBJECT is not set
> CONFIG_DEBUG_BUGVERBOSE=y
> CONFIG_DEBUG_INFO=y
> CONFIG_DEBUG_VM=y
> # CONFIG_DEBUG_WRITECOUNT is not set
> CONFIG_DEBUG_MEMORY_INIT=y
> # CONFIG_DEBUG_LIST is not set
> # CONFIG_DEBUG_SG is not set
> # CONFIG_DEBUG_NOTIFIERS is not set
> # CONFIG_DEBUG_CREDENTIALS is not set
> CONFIG_RCU_TORTURE_TEST=m
> CONFIG_RCU_CPU_STALL_DETECTOR=y
> # CONFIG_KPROBES_SANITY_TEST is not set
> # CONFIG_BACKTRACE_SELF_TEST is not set
> # CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
> # CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
> # CONFIG_LKDTM is not set
> # CONFIG_FAULT_INJECTION is not set
> # CONFIG_LATENCYTOP is not set
> # CONFIG_SYSCTL_SYSCALL_CHECK is not set
> # CONFIG_DEBUG_PAGEALLOC is not set
> CONFIG_NOP_TRACER=y
> CONFIG_HAVE_FUNCTION_TRACER=y
> CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
> CONFIG_HAVE_DYNAMIC_FTRACE=y
> CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
> CONFIG_RING_BUFFER=y
> CONFIG_EVENT_TRACING=y
> CONFIG_CONTEXT_SWITCH_TRACER=y
> CONFIG_RING_BUFFER_ALLOW_SWAP=y
> CONFIG_TRACING=y
> CONFIG_TRACING_SUPPORT=y
> CONFIG_FTRACE=y
> # CONFIG_FUNCTION_TRACER is not set
> # CONFIG_IRQSOFF_TRACER is not set
> # CONFIG_PREEMPT_TRACER is not set
> # CONFIG_SCHED_TRACER is not set
> # CONFIG_ENABLE_DEFAULT_TRACERS is not set
> # CONFIG_BOOT_TRACER is not set
> CONFIG_BRANCH_PROFILE_NONE=y
> # CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
> # CONFIG_PROFILE_ALL_BRANCHES is not set
> # CONFIG_STACK_TRACER is not set
> # CONFIG_KMEMTRACE is not set
> # CONFIG_WORKQUEUE_TRACER is not set
> # CONFIG_BLK_DEV_IO_TRACE is not set
> # CONFIG_RING_BUFFER_BENCHMARK is not set
> # CONFIG_DYNAMIC_DEBUG is not set
> # CONFIG_DMA_API_DEBUG is not set
> # CONFIG_SAMPLES is not set
> CONFIG_HAVE_ARCH_KGDB=y
> # CONFIG_KGDB is not set
> # CONFIG_PPC_DISABLE_WERROR is not set
> CONFIG_PPC_WERROR=y
> CONFIG_PRINT_STACK_DEPTH=64
> CONFIG_DEBUG_STACKOVERFLOW=y
> CONFIG_DEBUG_STACK_USAGE=y
> CONFIG_HCALL_STATS=y
> # CONFIG_PPC_EMULATED_STATS is not set
> # CONFIG_CODE_PATCHING_SELFTEST is not set
> # CONFIG_FTR_FIXUP_SELFTEST is not set
> # CONFIG_MSI_BITMAP_SELFTEST is not set
> CONFIG_XMON=y
> # CONFIG_XMON_DEFAULT is not set
> CONFIG_XMON_DISASSEMBLY=y
> CONFIG_DEBUGGER=y
> CONFIG_IRQSTACKS=y
> # CONFIG_VIRQ_DEBUG is not set
> CONFIG_BOOTX_TEXT=y
> # CONFIG_PPC_EARLY_DEBUG is not set
>
> #
> # Security options
> #
> # CONFIG_KEYS is not set
> CONFIG_SECURITY=y
> CONFIG_SECURITYFS=y
> CONFIG_SECURITY_NETWORK=y
> # CONFIG_SECURITY_NETWORK_XFRM is not set
> # CONFIG_SECURITY_PATH is not set
> # CONFIG_SECURITY_FILE_CAPABILITIES is not set
> # CONFIG_SECURITY_SELINUX is not set
> # CONFIG_SECURITY_TOMOYO is not set
> CONFIG_CRYPTO=y
>
> #
> # Crypto core or helper
> #
> # CONFIG_CRYPTO_FIPS is not set
> CONFIG_CRYPTO_ALGAPI=y
> CONFIG_CRYPTO_ALGAPI2=y
> CONFIG_CRYPTO_AEAD=m
> CONFIG_CRYPTO_AEAD2=y
> CONFIG_CRYPTO_BLKCIPHER=m
> CONFIG_CRYPTO_BLKCIPHER2=y
> CONFIG_CRYPTO_HASH=y
> CONFIG_CRYPTO_HASH2=y
> CONFIG_CRYPTO_RNG=m
> CONFIG_CRYPTO_RNG2=y
> CONFIG_CRYPTO_PCOMP=y
> CONFIG_CRYPTO_MANAGER=y
> CONFIG_CRYPTO_MANAGER2=y
> # CONFIG_CRYPTO_GF128MUL is not set
> CONFIG_CRYPTO_NULL=m
> CONFIG_CRYPTO_WORKQUEUE=y
> # CONFIG_CRYPTO_CRYPTD is not set
> CONFIG_CRYPTO_AUTHENC=m
> CONFIG_CRYPTO_TEST=m
>
> #
> # Authenticated Encryption with Associated Data
> #
> # CONFIG_CRYPTO_CCM is not set
> # CONFIG_CRYPTO_GCM is not set
> # CONFIG_CRYPTO_SEQIV is not set
>
> #
> # Block modes
> #
> CONFIG_CRYPTO_CBC=m
> # CONFIG_CRYPTO_CTR is not set
> # CONFIG_CRYPTO_CTS is not set
> CONFIG_CRYPTO_ECB=m
> # CONFIG_CRYPTO_LRW is not set
> # CONFIG_CRYPTO_PCBC is not set
> # CONFIG_CRYPTO_XTS is not set
>
> #
> # Hash modes
> #
> CONFIG_CRYPTO_HMAC=y
> # CONFIG_CRYPTO_XCBC is not set
> # CONFIG_CRYPTO_VMAC is not set
>
> #
> # Digest
> #
> CONFIG_CRYPTO_CRC32C=m
> # CONFIG_CRYPTO_GHASH is not set
> CONFIG_CRYPTO_MD4=m
> CONFIG_CRYPTO_MD5=y
> CONFIG_CRYPTO_MICHAEL_MIC=m
> # CONFIG_CRYPTO_RMD128 is not set
> # CONFIG_CRYPTO_RMD160 is not set
> # CONFIG_CRYPTO_RMD256 is not set
> # CONFIG_CRYPTO_RMD320 is not set
> CONFIG_CRYPTO_SHA1=m
> CONFIG_CRYPTO_SHA256=m
> CONFIG_CRYPTO_SHA512=m
> CONFIG_CRYPTO_TGR192=m
> CONFIG_CRYPTO_WP512=m
>
> #
> # Ciphers
> #
> CONFIG_CRYPTO_AES=m
> CONFIG_CRYPTO_ANUBIS=m
> CONFIG_CRYPTO_ARC4=m
> CONFIG_CRYPTO_BLOWFISH=m
> # CONFIG_CRYPTO_CAMELLIA is not set
> CONFIG_CRYPTO_CAST5=m
> CONFIG_CRYPTO_CAST6=m
> CONFIG_CRYPTO_DES=y
> # CONFIG_CRYPTO_FCRYPT is not set
> CONFIG_CRYPTO_KHAZAD=m
> # CONFIG_CRYPTO_SALSA20 is not set
> # CONFIG_CRYPTO_SEED is not set
> CONFIG_CRYPTO_SERPENT=m
> CONFIG_CRYPTO_TEA=m
> CONFIG_CRYPTO_TWOFISH=m
> CONFIG_CRYPTO_TWOFISH_COMMON=m
>
> #
> # Compression
> #
> CONFIG_CRYPTO_DEFLATE=m
> # CONFIG_CRYPTO_ZLIB is not set
> # CONFIG_CRYPTO_LZO is not set
>
> #
> # Random Number Generation
> #
> CONFIG_CRYPTO_ANSI_CPRNG=m
> CONFIG_CRYPTO_HW=y
> # CONFIG_CRYPTO_DEV_HIFN_795X is not set
> # CONFIG_PPC_CLOCK is not set
> # CONFIG_VIRTUALIZATION is not set
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. ?For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
On Fri, 2009-11-20 at 08:49 +0200, Pekka Enberg wrote:
> Hi Paul,
>
> On Wed, Nov 18, 2009 at 8:12 PM, Paul E. McKenney
> <[email protected]> wrote:
> > I am seeing some lockdep complaints in rcutorture runs that include
> > frequent CPU-hotplug operations. The tests are otherwise successful.
> > My first thought was to send a patch that gave each array_cache
> > structure's ->lock field its own struct lock_class_key, but you already
> > have a init_lock_keys() that seems to be intended to deal with this.
> >
> > So, please see below for the lockdep complaint and the .config file.
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > =============================================
> > [ INFO: possible recursive locking detected ]
> > 2.6.32-rc4-autokern1 #1
> > ---------------------------------------------
> > syslogd/2908 is trying to acquire lock:
> > (&nc->lock){..-...}, at: [<c0000000001407f4>] .kmem_cache_free+0x118/0x2d4
> >
> > but task is already holding lock:
> > (&nc->lock){..-...}, at: [<c0000000001411bc>] .kfree+0x1f0/0x324
> >
> > other info that might help us debug this:
> > 3 locks held by syslogd/2908:
> > #0: (&u->readlock){+.+.+.}, at: [<c0000000004556f8>] .unix_dgram_recvmsg+0x70/0x338
> > #1: (&nc->lock){..-...}, at: [<c0000000001411bc>] .kfree+0x1f0/0x324
> > #2: (&parent->list_lock){-.-...}, at: [<c000000000140f64>] .__drain_alien_cache+0x50/0xb8
>
> I *think* this is a false positive. The nc->lock in slab_destroy()
> should always be different from the one we took in kfree() because
> it's a per-struct kmem_cache "slab cache". Peter, what do you think?
> If my analysis is correct, any suggestions how to fix lockdep
> annotations in slab?
Did anything change recently? git-log mm/slab.c doesn't show anything
obvious, although ec5a36f94e7ca4b1f28ae4dd135cd415a704e772 has the exact
same lock recursion msg ;-)
So basically its this stupid recursion issue where you allocate the slab
meta structure using the slab allocator, and now have to free while
freeing, right?
/me gets lost in slab, tries again..
The code in kmem_cache_create() suggests its not even fixed size, so
there is no single cache backing all this OFF_SLAB muck :-(
It does appear to be limited to the kmalloc slabs..
There's a few possible solutions -- in order of preference:
1) do the great slab cleanup now and remove slab.c, this will avoid any
further waste of manhours and braincells trying to make slab limp along.
2) propagate the nesting information and user spin_lock_nested(), given
that slab is already a rat's nest, this won't make it any less obvious.
3) Give each kmalloc cache its own lock class.
On Fri, Nov 20, 2009 at 11:25 AM, Peter Zijlstra <[email protected]> wrote:
> Did anything change recently? git-log mm/slab.c doesn't show anything
> obvious, although ec5a36f94e7ca4b1f28ae4dd135cd415a704e772 has the exact
> same lock recursion msg ;-)
No, SLAB hasn't changed for a while.
On Fri, Nov 20, 2009 at 11:25 AM, Peter Zijlstra <[email protected]> wrote:
> So basically its this stupid recursion issue where you allocate the slab
> meta structure using the slab allocator, and now have to free while
> freeing, right?
Yes.
On Fri, Nov 20, 2009 at 11:25 AM, Peter Zijlstra <[email protected]> wrote:
> The code in kmem_cache_create() suggests its not even fixed size, so
> there is no single cache backing all this OFF_SLAB muck :-(
Oh, crap, I missed that. It's variable-length because we allocate the
freelists (bufctls in slab-speak) in the slab managment structure. So
this is a genuine bug.
On Fri, Nov 20, 2009 at 11:25 AM, Peter Zijlstra <[email protected]> wrote:
> It does appear to be limited to the kmalloc slabs..
>
> There's a few possible solutions -- in order of preference:
>
> ?1) do the great slab cleanup now and remove slab.c, this will avoid any
> further waste of manhours and braincells trying to make slab limp along.
:-) I don't think that's an option for 2.6.33.
On Fri, Nov 20, 2009 at 11:25 AM, Peter Zijlstra <[email protected]> wrote:
> ?2) propagate the nesting information and user spin_lock_nested(), given
> that slab is already a rat's nest, this won't make it any less obvious.
spin_lock_nested() doesn't really help us here because there's a
_real_ possibility of a recursive spin lock here, right?
Pekka
On Fri, 2009-11-20 at 12:38 +0200, Pekka Enberg wrote:
>
>
> On Fri, Nov 20, 2009 at 11:25 AM, Peter Zijlstra <[email protected]> wrote:
> > 2) propagate the nesting information and user spin_lock_nested(), given
> > that slab is already a rat's nest, this won't make it any less obvious.
>
> spin_lock_nested() doesn't really help us here because there's a
> _real_ possibility of a recursive spin lock here, right?
Well, I was working under the assumption that your analysis of it being
a false positive was right ;-)
I briefly tried to verify that, but got lost and gave up, at which point
I started looking for ways to annotate.
If you're now saying its a real deadlock waiting to happen, then the
quick fix is to always do the call_rcu() thing, or a slightly longer fix
might be to take that slab object and propagate it out up the callchain
and free it once we drop the nc->lock for the current __cache_free() or
something.
Peter Zijlstra kirjoitti:
> On Fri, 2009-11-20 at 12:38 +0200, Pekka Enberg wrote:
>>
>> On Fri, Nov 20, 2009 at 11:25 AM, Peter Zijlstra <[email protected]> wrote:
>>> 2) propagate the nesting information and user spin_lock_nested(), given
>>> that slab is already a rat's nest, this won't make it any less obvious.
>> spin_lock_nested() doesn't really help us here because there's a
>> _real_ possibility of a recursive spin lock here, right?
>
> Well, I was working under the assumption that your analysis of it being
> a false positive was right ;-)
>
> I briefly tried to verify that, but got lost and gave up, at which point
> I started looking for ways to annotate.
Uh, ok, so apparently I was right after all. There's a comment in
free_block() above the slab_destroy() call that refers to the comment
above alloc_slabmgmt() function definition which explains it all.
Long story short: ->slab_cachep never points to the same kmalloc cache
we're allocating or freeing from. Where do we need to put the
spin_lock_nested() annotation? Would it be enough to just use it in
cache_free_alien() for alien->lock or do we need it in
cache_flusharray() as well?
Pekka
On Fri, Nov 20, 2009 at 01:05:58PM +0200, Pekka Enberg wrote:
> Peter Zijlstra kirjoitti:
>> On Fri, 2009-11-20 at 12:38 +0200, Pekka Enberg wrote:
>>>
>>> On Fri, Nov 20, 2009 at 11:25 AM, Peter Zijlstra <[email protected]>
>>> wrote:
>>>> 2) propagate the nesting information and user spin_lock_nested(), given
>>>> that slab is already a rat's nest, this won't make it any less obvious.
>>> spin_lock_nested() doesn't really help us here because there's a
>>> _real_ possibility of a recursive spin lock here, right?
>> Well, I was working under the assumption that your analysis of it being
>> a false positive was right ;-)
>> I briefly tried to verify that, but got lost and gave up, at which point
>> I started looking for ways to annotate.
>
> Uh, ok, so apparently I was right after all. There's a comment in
> free_block() above the slab_destroy() call that refers to the comment above
> alloc_slabmgmt() function definition which explains it all.
>
> Long story short: ->slab_cachep never points to the same kmalloc cache
> we're allocating or freeing from. Where do we need to put the
> spin_lock_nested() annotation? Would it be enough to just use it in
> cache_free_alien() for alien->lock or do we need it in cache_flusharray()
> as well?
Hmmm... If the nc->lock spinlocks are always from different slabs
(as alloc_slabmgmt()'s block comment claims), why not just give each
array_cache structure's lock its own struct lock_class_key? They
are zero size unless you have lockdep enabled.
Thanx, Paul
On Fri, 2009-11-20 at 13:05 +0200, Pekka Enberg wrote:
> Peter Zijlstra kirjoitti:
> > On Fri, 2009-11-20 at 12:38 +0200, Pekka Enberg wrote:
> >>
> >> On Fri, Nov 20, 2009 at 11:25 AM, Peter Zijlstra <[email protected]> wrote:
> >>> 2) propagate the nesting information and user spin_lock_nested(), given
> >>> that slab is already a rat's nest, this won't make it any less obvious.
> >> spin_lock_nested() doesn't really help us here because there's a
> >> _real_ possibility of a recursive spin lock here, right?
> >
> > Well, I was working under the assumption that your analysis of it being
> > a false positive was right ;-)
> >
> > I briefly tried to verify that, but got lost and gave up, at which point
> > I started looking for ways to annotate.
>
> Uh, ok, so apparently I was right after all. There's a comment in
> free_block() above the slab_destroy() call that refers to the comment
> above alloc_slabmgmt() function definition which explains it all.
>
> Long story short: ->slab_cachep never points to the same kmalloc cache
> we're allocating or freeing from. Where do we need to put the
> spin_lock_nested() annotation? Would it be enough to just use it in
> cache_free_alien() for alien->lock or do we need it in
> cache_flusharray() as well?
You'd have to somehow push the nested state down from the
kmem_cache_free() call in slab_destroy() to all nc->lock sites below.
On Fri, 2009-11-20 at 06:48 -0800, Paul E. McKenney wrote:
> On Fri, Nov 20, 2009 at 01:05:58PM +0200, Pekka Enberg wrote:
> > Peter Zijlstra kirjoitti:
> >> On Fri, 2009-11-20 at 12:38 +0200, Pekka Enberg wrote:
> >>>
> >>> On Fri, Nov 20, 2009 at 11:25 AM, Peter Zijlstra <[email protected]>
> >>> wrote:
> >>>> 2) propagate the nesting information and user spin_lock_nested(), given
> >>>> that slab is already a rat's nest, this won't make it any less obvious.
> >>> spin_lock_nested() doesn't really help us here because there's a
> >>> _real_ possibility of a recursive spin lock here, right?
> >> Well, I was working under the assumption that your analysis of it being
> >> a false positive was right ;-)
> >> I briefly tried to verify that, but got lost and gave up, at which point
> >> I started looking for ways to annotate.
> >
> > Uh, ok, so apparently I was right after all. There's a comment in
> > free_block() above the slab_destroy() call that refers to the comment above
> > alloc_slabmgmt() function definition which explains it all.
> >
> > Long story short: ->slab_cachep never points to the same kmalloc cache
> > we're allocating or freeing from. Where do we need to put the
> > spin_lock_nested() annotation? Would it be enough to just use it in
> > cache_free_alien() for alien->lock or do we need it in cache_flusharray()
> > as well?
>
> Hmmm... If the nc->lock spinlocks are always from different slabs
> (as alloc_slabmgmt()'s block comment claims), why not just give each
> array_cache structure's lock its own struct lock_class_key? They
> are zero size unless you have lockdep enabled.
Because more classes:
- takes more (static/limited) lockdep resources
- make more chains, weakening lock dependency tracking
because it can no longer use the state observed in one branch
on state observed in another branch.
Suppose you have 3 locks and 2 classes, lock 1 and 2 part of class A and
lock 3 of class B
Then if we observe 1 -> 3, and 3 -> 2, we'd see A->B and B->A, and go
yell. Now if we split class A into two classes and these locks get into
separate classes we loose that cycle.
Now in this case we want to break a cycle, so the above will be correct,
but all resulting chains will be equivalent for 99% (with the one
exception of this funny recursion case) wasting lots of resources and
state matching opportunity.
Therefore it would be much better to use the _nested annotation if
possible.
On Fri, Nov 20, 2009 at 04:17:40PM +0100, Peter Zijlstra wrote:
> On Fri, 2009-11-20 at 06:48 -0800, Paul E. McKenney wrote:
> > On Fri, Nov 20, 2009 at 01:05:58PM +0200, Pekka Enberg wrote:
> > > Peter Zijlstra kirjoitti:
> > >> On Fri, 2009-11-20 at 12:38 +0200, Pekka Enberg wrote:
> > >>>
> > >>> On Fri, Nov 20, 2009 at 11:25 AM, Peter Zijlstra <[email protected]>
> > >>> wrote:
> > >>>> 2) propagate the nesting information and user spin_lock_nested(), given
> > >>>> that slab is already a rat's nest, this won't make it any less obvious.
> > >>> spin_lock_nested() doesn't really help us here because there's a
> > >>> _real_ possibility of a recursive spin lock here, right?
> > >> Well, I was working under the assumption that your analysis of it being
> > >> a false positive was right ;-)
> > >> I briefly tried to verify that, but got lost and gave up, at which point
> > >> I started looking for ways to annotate.
> > >
> > > Uh, ok, so apparently I was right after all. There's a comment in
> > > free_block() above the slab_destroy() call that refers to the comment above
> > > alloc_slabmgmt() function definition which explains it all.
> > >
> > > Long story short: ->slab_cachep never points to the same kmalloc cache
> > > we're allocating or freeing from. Where do we need to put the
> > > spin_lock_nested() annotation? Would it be enough to just use it in
> > > cache_free_alien() for alien->lock or do we need it in cache_flusharray()
> > > as well?
> >
> > Hmmm... If the nc->lock spinlocks are always from different slabs
> > (as alloc_slabmgmt()'s block comment claims), why not just give each
> > array_cache structure's lock its own struct lock_class_key? They
> > are zero size unless you have lockdep enabled.
>
> Because more classes:
>
> - takes more (static/limited) lockdep resources
>
> - make more chains, weakening lock dependency tracking
> because it can no longer use the state observed in one branch
> on state observed in another branch.
>
> Suppose you have 3 locks and 2 classes, lock 1 and 2 part of class A and
> lock 3 of class B
>
> Then if we observe 1 -> 3, and 3 -> 2, we'd see A->B and B->A, and go
> yell. Now if we split class A into two classes and these locks get into
> separate classes we loose that cycle.
>
> Now in this case we want to break a cycle, so the above will be correct,
> but all resulting chains will be equivalent for 99% (with the one
> exception of this funny recursion case) wasting lots of resources and
> state matching opportunity.
>
> Therefore it would be much better to use the _nested annotation if
> possible.
Got it, thank you for the explanation!!!
I will keep this in mind when reconsidering the RCU lockdep interactions.
Thanx, Paul
Hi Peter,
On Fri, 2009-11-20 at 16:09 +0100, Peter Zijlstra wrote:
> > Uh, ok, so apparently I was right after all. There's a comment in
> > free_block() above the slab_destroy() call that refers to the comment
> > above alloc_slabmgmt() function definition which explains it all.
> >
> > Long story short: ->slab_cachep never points to the same kmalloc cache
> > we're allocating or freeing from. Where do we need to put the
> > spin_lock_nested() annotation? Would it be enough to just use it in
> > cache_free_alien() for alien->lock or do we need it in
> > cache_flusharray() as well?
>
> You'd have to somehow push the nested state down from the
> kmem_cache_free() call in slab_destroy() to all nc->lock sites below.
That turns out to be _very_ hard. How about something like the following
untested patch which delays slab_destroy() while we're under nc->lock.
Pekka
diff --git a/mm/slab.c b/mm/slab.c
index 7dfa481..6f522e3 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -316,7 +316,7 @@ struct kmem_list3 __initdata initkmem_list3[NUM_INIT_LISTS];
static int drain_freelist(struct kmem_cache *cache,
struct kmem_list3 *l3, int tofree);
static void free_block(struct kmem_cache *cachep, void **objpp, int len,
- int node);
+ int node, struct list_head *to_destroy);
static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp);
static void cache_reap(struct work_struct *unused);
@@ -1002,7 +1002,8 @@ static void free_alien_cache(struct array_cache **ac_ptr)
}
static void __drain_alien_cache(struct kmem_cache *cachep,
- struct array_cache *ac, int node)
+ struct array_cache *ac, int node,
+ struct list_head *to_destroy)
{
struct kmem_list3 *rl3 = cachep->nodelists[node];
@@ -1016,12 +1017,22 @@ static void __drain_alien_cache(struct kmem_cache *cachep,
if (rl3->shared)
transfer_objects(rl3->shared, ac, ac->limit);
- free_block(cachep, ac->entry, ac->avail, node);
+ free_block(cachep, ac->entry, ac->avail, node, to_destroy);
ac->avail = 0;
spin_unlock(&rl3->list_lock);
}
}
+static void slab_destroy(struct kmem_cache *, struct slab *);
+
+static void destroy_slabs(struct kmem_cache *cache, struct list_head *to_destroy)
+{
+ struct slab *slab, *tmp;
+
+ list_for_each_entry_safe(slab, tmp, to_destroy, list)
+ slab_destroy(cache, slab);
+}
+
/*
* Called from cache_reap() to regularly drain alien caches round robin.
*/
@@ -1033,8 +1044,11 @@ static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3)
struct array_cache *ac = l3->alien[node];
if (ac && ac->avail && spin_trylock_irq(&ac->lock)) {
- __drain_alien_cache(cachep, ac, node);
+ LIST_HEAD(to_destroy);
+
+ __drain_alien_cache(cachep, ac, node, &to_destroy);
spin_unlock_irq(&ac->lock);
+ destroy_slabs(cachep, &to_destroy);
}
}
}
@@ -1049,9 +1063,12 @@ static void drain_alien_cache(struct kmem_cache *cachep,
for_each_online_node(i) {
ac = alien[i];
if (ac) {
+ LIST_HEAD(to_destroy);
+
spin_lock_irqsave(&ac->lock, flags);
- __drain_alien_cache(cachep, ac, i);
+ __drain_alien_cache(cachep, ac, i, &to_destroy);
spin_unlock_irqrestore(&ac->lock, flags);
+ destroy_slabs(cachep, &to_destroy);
}
}
}
@@ -1076,17 +1093,20 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
l3 = cachep->nodelists[node];
STATS_INC_NODEFREES(cachep);
if (l3->alien && l3->alien[nodeid]) {
+ LIST_HEAD(to_destroy);
+
alien = l3->alien[nodeid];
spin_lock(&alien->lock);
if (unlikely(alien->avail == alien->limit)) {
STATS_INC_ACOVERFLOW(cachep);
- __drain_alien_cache(cachep, alien, nodeid);
+ __drain_alien_cache(cachep, alien, nodeid, &to_destroy);
}
alien->entry[alien->avail++] = objp;
spin_unlock(&alien->lock);
+ destroy_slabs(cachep, &to_destroy);
} else {
spin_lock(&(cachep->nodelists[nodeid])->list_lock);
- free_block(cachep, &objp, 1, nodeid);
+ free_block(cachep, &objp, 1, nodeid, NULL);
spin_unlock(&(cachep->nodelists[nodeid])->list_lock);
}
return 1;
@@ -1118,7 +1138,7 @@ static void __cpuinit cpuup_canceled(long cpu)
/* Free limit for this kmem_list3 */
l3->free_limit -= cachep->batchcount;
if (nc)
- free_block(cachep, nc->entry, nc->avail, node);
+ free_block(cachep, nc->entry, nc->avail, node, NULL);
if (!cpus_empty(*mask)) {
spin_unlock_irq(&l3->list_lock);
@@ -1128,7 +1148,7 @@ static void __cpuinit cpuup_canceled(long cpu)
shared = l3->shared;
if (shared) {
free_block(cachep, shared->entry,
- shared->avail, node);
+ shared->avail, node, NULL);
l3->shared = NULL;
}
@@ -2402,7 +2422,7 @@ static void do_drain(void *arg)
check_irq_off();
ac = cpu_cache_get(cachep);
spin_lock(&cachep->nodelists[node]->list_lock);
- free_block(cachep, ac->entry, ac->avail, node);
+ free_block(cachep, ac->entry, ac->avail, node, NULL);
spin_unlock(&cachep->nodelists[node]->list_lock);
ac->avail = 0;
}
@@ -3410,7 +3430,7 @@ __cache_alloc(struct kmem_cache *cachep, gfp_t flags, void *caller)
* Caller needs to acquire correct kmem_list's list_lock
*/
static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
- int node)
+ int node, struct list_head *to_destroy)
{
int i;
struct kmem_list3 *l3;
@@ -3439,7 +3459,10 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
* a different cache, refer to comments before
* alloc_slabmgmt.
*/
- slab_destroy(cachep, slabp);
+ if (to_destroy)
+ list_add(&slabp->list, to_destroy);
+ else
+ slab_destroy(cachep, slabp);
} else {
list_add(&slabp->list, &l3->slabs_free);
}
@@ -3479,7 +3502,7 @@ static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac)
}
}
- free_block(cachep, ac->entry, batchcount, node);
+ free_block(cachep, ac->entry, batchcount, node, NULL);
free_done:
#if STATS
{
@@ -3822,7 +3845,7 @@ static int alloc_kmemlist(struct kmem_cache *cachep, gfp_t gfp)
if (shared)
free_block(cachep, shared->entry,
- shared->avail, node);
+ shared->avail, node, NULL);
l3->shared = new_shared;
if (!l3->alien) {
@@ -3925,7 +3948,7 @@ static int do_tune_cpucache(struct kmem_cache *cachep, int limit,
if (!ccold)
continue;
spin_lock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
- free_block(cachep, ccold->entry, ccold->avail, cpu_to_node(i));
+ free_block(cachep, ccold->entry, ccold->avail, cpu_to_node(i), NULL);
spin_unlock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
kfree(ccold);
}
@@ -4007,7 +4030,7 @@ void drain_array(struct kmem_cache *cachep, struct kmem_list3 *l3,
tofree = force ? ac->avail : (ac->limit + 4) / 5;
if (tofree > ac->avail)
tofree = (ac->avail + 1) / 2;
- free_block(cachep, ac->entry, tofree, node);
+ free_block(cachep, ac->entry, tofree, node, NULL);
ac->avail -= tofree;
memmove(ac->entry, &(ac->entry[tofree]),
sizeof(void *) * ac->avail);
On Mon, 2009-11-23 at 21:00 +0200, Pekka Enberg wrote:
> Hi Peter,
>
> On Fri, 2009-11-20 at 16:09 +0100, Peter Zijlstra wrote:
> > > Uh, ok, so apparently I was right after all. There's a comment in
> > > free_block() above the slab_destroy() call that refers to the comment
> > > above alloc_slabmgmt() function definition which explains it all.
> > >
> > > Long story short: ->slab_cachep never points to the same kmalloc cache
> > > we're allocating or freeing from. Where do we need to put the
> > > spin_lock_nested() annotation? Would it be enough to just use it in
> > > cache_free_alien() for alien->lock or do we need it in
> > > cache_flusharray() as well?
> >
> > You'd have to somehow push the nested state down from the
> > kmem_cache_free() call in slab_destroy() to all nc->lock sites below.
>
> That turns out to be _very_ hard. How about something like the following
> untested patch which delays slab_destroy() while we're under nc->lock.
>
> Pekka
This seems like a lot of work to paper over a lockdep false positive in
code that should be firmly in the maintenance end of its lifecycle? I'd
rather the fix or papering over happen in lockdep.
Introducing extra cacheline pressure by passing to_destroy around also
seems like a good way to trickle away SLAB's narrow remaining
performance advantages.
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 7dfa481..6f522e3 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -316,7 +316,7 @@ struct kmem_list3 __initdata initkmem_list3[NUM_INIT_LISTS];
> static int drain_freelist(struct kmem_cache *cache,
> struct kmem_list3 *l3, int tofree);
> static void free_block(struct kmem_cache *cachep, void **objpp, int len,
> - int node);
> + int node, struct list_head *to_destroy);
> static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp);
> static void cache_reap(struct work_struct *unused);
>
> @@ -1002,7 +1002,8 @@ static void free_alien_cache(struct array_cache **ac_ptr)
> }
>
> static void __drain_alien_cache(struct kmem_cache *cachep,
> - struct array_cache *ac, int node)
> + struct array_cache *ac, int node,
> + struct list_head *to_destroy)
> {
> struct kmem_list3 *rl3 = cachep->nodelists[node];
>
> @@ -1016,12 +1017,22 @@ static void __drain_alien_cache(struct kmem_cache *cachep,
> if (rl3->shared)
> transfer_objects(rl3->shared, ac, ac->limit);
>
> - free_block(cachep, ac->entry, ac->avail, node);
> + free_block(cachep, ac->entry, ac->avail, node, to_destroy);
> ac->avail = 0;
> spin_unlock(&rl3->list_lock);
> }
> }
>
> +static void slab_destroy(struct kmem_cache *, struct slab *);
> +
> +static void destroy_slabs(struct kmem_cache *cache, struct list_head *to_destroy)
> +{
> + struct slab *slab, *tmp;
> +
> + list_for_each_entry_safe(slab, tmp, to_destroy, list)
> + slab_destroy(cache, slab);
> +}
> +
> /*
> * Called from cache_reap() to regularly drain alien caches round robin.
> */
> @@ -1033,8 +1044,11 @@ static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3)
> struct array_cache *ac = l3->alien[node];
>
> if (ac && ac->avail && spin_trylock_irq(&ac->lock)) {
> - __drain_alien_cache(cachep, ac, node);
> + LIST_HEAD(to_destroy);
> +
> + __drain_alien_cache(cachep, ac, node, &to_destroy);
> spin_unlock_irq(&ac->lock);
> + destroy_slabs(cachep, &to_destroy);
> }
> }
> }
> @@ -1049,9 +1063,12 @@ static void drain_alien_cache(struct kmem_cache *cachep,
> for_each_online_node(i) {
> ac = alien[i];
> if (ac) {
> + LIST_HEAD(to_destroy);
> +
> spin_lock_irqsave(&ac->lock, flags);
> - __drain_alien_cache(cachep, ac, i);
> + __drain_alien_cache(cachep, ac, i, &to_destroy);
> spin_unlock_irqrestore(&ac->lock, flags);
> + destroy_slabs(cachep, &to_destroy);
> }
> }
> }
> @@ -1076,17 +1093,20 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
> l3 = cachep->nodelists[node];
> STATS_INC_NODEFREES(cachep);
> if (l3->alien && l3->alien[nodeid]) {
> + LIST_HEAD(to_destroy);
> +
> alien = l3->alien[nodeid];
> spin_lock(&alien->lock);
> if (unlikely(alien->avail == alien->limit)) {
> STATS_INC_ACOVERFLOW(cachep);
> - __drain_alien_cache(cachep, alien, nodeid);
> + __drain_alien_cache(cachep, alien, nodeid, &to_destroy);
> }
> alien->entry[alien->avail++] = objp;
> spin_unlock(&alien->lock);
> + destroy_slabs(cachep, &to_destroy);
> } else {
> spin_lock(&(cachep->nodelists[nodeid])->list_lock);
> - free_block(cachep, &objp, 1, nodeid);
> + free_block(cachep, &objp, 1, nodeid, NULL);
> spin_unlock(&(cachep->nodelists[nodeid])->list_lock);
> }
> return 1;
> @@ -1118,7 +1138,7 @@ static void __cpuinit cpuup_canceled(long cpu)
> /* Free limit for this kmem_list3 */
> l3->free_limit -= cachep->batchcount;
> if (nc)
> - free_block(cachep, nc->entry, nc->avail, node);
> + free_block(cachep, nc->entry, nc->avail, node, NULL);
>
> if (!cpus_empty(*mask)) {
> spin_unlock_irq(&l3->list_lock);
> @@ -1128,7 +1148,7 @@ static void __cpuinit cpuup_canceled(long cpu)
> shared = l3->shared;
> if (shared) {
> free_block(cachep, shared->entry,
> - shared->avail, node);
> + shared->avail, node, NULL);
> l3->shared = NULL;
> }
>
> @@ -2402,7 +2422,7 @@ static void do_drain(void *arg)
> check_irq_off();
> ac = cpu_cache_get(cachep);
> spin_lock(&cachep->nodelists[node]->list_lock);
> - free_block(cachep, ac->entry, ac->avail, node);
> + free_block(cachep, ac->entry, ac->avail, node, NULL);
> spin_unlock(&cachep->nodelists[node]->list_lock);
> ac->avail = 0;
> }
> @@ -3410,7 +3430,7 @@ __cache_alloc(struct kmem_cache *cachep, gfp_t flags, void *caller)
> * Caller needs to acquire correct kmem_list's list_lock
> */
> static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
> - int node)
> + int node, struct list_head *to_destroy)
> {
> int i;
> struct kmem_list3 *l3;
> @@ -3439,7 +3459,10 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
> * a different cache, refer to comments before
> * alloc_slabmgmt.
> */
> - slab_destroy(cachep, slabp);
> + if (to_destroy)
> + list_add(&slabp->list, to_destroy);
> + else
> + slab_destroy(cachep, slabp);
> } else {
> list_add(&slabp->list, &l3->slabs_free);
> }
> @@ -3479,7 +3502,7 @@ static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac)
> }
> }
>
> - free_block(cachep, ac->entry, batchcount, node);
> + free_block(cachep, ac->entry, batchcount, node, NULL);
> free_done:
> #if STATS
> {
> @@ -3822,7 +3845,7 @@ static int alloc_kmemlist(struct kmem_cache *cachep, gfp_t gfp)
>
> if (shared)
> free_block(cachep, shared->entry,
> - shared->avail, node);
> + shared->avail, node, NULL);
>
> l3->shared = new_shared;
> if (!l3->alien) {
> @@ -3925,7 +3948,7 @@ static int do_tune_cpucache(struct kmem_cache *cachep, int limit,
> if (!ccold)
> continue;
> spin_lock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
> - free_block(cachep, ccold->entry, ccold->avail, cpu_to_node(i));
> + free_block(cachep, ccold->entry, ccold->avail, cpu_to_node(i), NULL);
> spin_unlock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
> kfree(ccold);
> }
> @@ -4007,7 +4030,7 @@ void drain_array(struct kmem_cache *cachep, struct kmem_list3 *l3,
> tofree = force ? ac->avail : (ac->limit + 4) / 5;
> if (tofree > ac->avail)
> tofree = (ac->avail + 1) / 2;
> - free_block(cachep, ac->entry, tofree, node);
> + free_block(cachep, ac->entry, tofree, node, NULL);
> ac->avail -= tofree;
> memmove(ac->entry, &(ac->entry[tofree]),
> sizeof(void *) * ac->avail);
>
--
http://selenic.com : development and support for Mercurial and Linux
Matt Mackall wrote:
> This seems like a lot of work to paper over a lockdep false positive in
> code that should be firmly in the maintenance end of its lifecycle? I'd
> rather the fix or papering over happen in lockdep.
True that. Is __raw_spin_lock() out of question, Peter?-) Passing the
state is pretty invasive because of the kmem_cache_free() call in
slab_destroy(). We re-enter the slab allocator from the outer edges
which makes spin_lock_nested() very inconvenient.
> Introducing extra cacheline pressure by passing to_destroy around also
> seems like a good way to trickle away SLAB's narrow remaining
> performance advantages.
We can probably fix that to affect CONFIG_NUMA only which sucks already.
Pekka
On Mon, 23 Nov 2009, Pekka Enberg wrote:
> That turns out to be _very_ hard. How about something like the following
> untested patch which delays slab_destroy() while we're under nc->lock.
Code changes to deal with a diagnostic issue?
On Mon, Nov 23, 2009 at 01:30:50PM -0600, Christoph Lameter wrote:
> On Mon, 23 Nov 2009, Pekka Enberg wrote:
>
> > That turns out to be _very_ hard. How about something like the following
> > untested patch which delays slab_destroy() while we're under nc->lock.
>
> Code changes to deal with a diagnostic issue?
Indeed! At least if we want the diagnostics to have any value, we do
need to avoid false alarms. Same reasoning as for gcc warnings, right?
Thanx, Paul
On Mon, 23 Nov 2009, Pekka Enberg wrote:
> > That turns out to be _very_ hard. How about something like the following
> > untested patch which delays slab_destroy() while we're under nc->lock.
On Mon, 2009-11-23 at 13:30 -0600, Christoph Lameter wrote:
> Code changes to deal with a diagnostic issue?
OK, fair enough. If I suffer permanent brain damage from staring at the
SLAB code for too long, I hope you and Matt will chip in to pay for my
medication.
I think I was looking at the wrong thing here. The problem is in
cache_free_alien() so the comment in slab_destroy() isn't relevant.
Looking at init_lock_keys() we already do special lockdep annotations
but there's a catch (as explained in a comment on top of
on_slab_alc_key):
* We set lock class for alien array caches which are up during init.
* The lock annotation will be lost if all cpus of a node goes down and
* then comes back up during hotplug
Paul said he was running CPU hotplug so maybe that explains the problem?
Pekka
On Mon, 2009-11-23 at 21:50 +0200, Pekka Enberg wrote:
> On Mon, 23 Nov 2009, Pekka Enberg wrote:
> > > That turns out to be _very_ hard. How about something like the following
> > > untested patch which delays slab_destroy() while we're under nc->lock.
>
> On Mon, 2009-11-23 at 13:30 -0600, Christoph Lameter wrote:
> > Code changes to deal with a diagnostic issue?
>
> OK, fair enough. If I suffer permanent brain damage from staring at the
> SLAB code for too long, I hope you and Matt will chip in to pay for my
> medication.
>
> I think I was looking at the wrong thing here. The problem is in
> cache_free_alien() so the comment in slab_destroy() isn't relevant.
> Looking at init_lock_keys() we already do special lockdep annotations
> but there's a catch (as explained in a comment on top of
> on_slab_alc_key):
>
> * We set lock class for alien array caches which are up during init.
> * The lock annotation will be lost if all cpus of a node goes down and
> * then comes back up during hotplug
>
> Paul said he was running CPU hotplug so maybe that explains the problem?
Maybe something like this untested patch fixes the issue...
Pekka
diff --git a/mm/slab.c b/mm/slab.c
index 7dfa481..84de47e 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -604,6 +604,26 @@ static struct kmem_cache cache_cache = {
#define BAD_ALIEN_MAGIC 0x01020304ul
+/*
+ * chicken and egg problem: delay the per-cpu array allocation
+ * until the general caches are up.
+ */
+static enum {
+ NONE,
+ PARTIAL_AC,
+ PARTIAL_L3,
+ EARLY,
+ FULL
+} g_cpucache_up;
+
+/*
+ * used by boot code to determine if it can use slab based allocator
+ */
+int slab_is_available(void)
+{
+ return g_cpucache_up >= EARLY;
+}
+
#ifdef CONFIG_LOCKDEP
/*
@@ -620,40 +640,52 @@ static struct kmem_cache cache_cache = {
static struct lock_class_key on_slab_l3_key;
static struct lock_class_key on_slab_alc_key;
-static inline void init_lock_keys(void)
-
+static void init_node_lock_keys(int q)
{
- int q;
struct cache_sizes *s = malloc_sizes;
- while (s->cs_size != ULONG_MAX) {
- for_each_node(q) {
- struct array_cache **alc;
- int r;
- struct kmem_list3 *l3 = s->cs_cachep->nodelists[q];
- if (!l3 || OFF_SLAB(s->cs_cachep))
- continue;
- lockdep_set_class(&l3->list_lock, &on_slab_l3_key);
- alc = l3->alien;
- /*
- * FIXME: This check for BAD_ALIEN_MAGIC
- * should go away when common slab code is taught to
- * work even without alien caches.
- * Currently, non NUMA code returns BAD_ALIEN_MAGIC
- * for alloc_alien_cache,
- */
- if (!alc || (unsigned long)alc == BAD_ALIEN_MAGIC)
- continue;
- for_each_node(r) {
- if (alc[r])
- lockdep_set_class(&alc[r]->lock,
- &on_slab_alc_key);
- }
+ if (g_cpucache_up != FULL)
+ return;
+
+ for (s = malloc_sizes; s->cs_size != ULONG_MAX; s++) {
+ struct array_cache **alc;
+ struct kmem_list3 *l3;
+ int r;
+
+ l3 = s->cs_cachep->nodelists[q];
+ if (!l3 || OFF_SLAB(s->cs_cachep))
+ return;
+ lockdep_set_class(&l3->list_lock, &on_slab_l3_key);
+ alc = l3->alien;
+ /*
+ * FIXME: This check for BAD_ALIEN_MAGIC
+ * should go away when common slab code is taught to
+ * work even without alien caches.
+ * Currently, non NUMA code returns BAD_ALIEN_MAGIC
+ * for alloc_alien_cache,
+ */
+ if (!alc || (unsigned long)alc == BAD_ALIEN_MAGIC)
+ return;
+ for_each_node(r) {
+ if (alc[r])
+ lockdep_set_class(&alc[r]->lock,
+ &on_slab_alc_key);
}
- s++;
}
}
+
+static inline void init_lock_keys(void)
+{
+ int node;
+
+ for_each_node(node)
+ init_node_lock_keys(node);
+}
#else
+static void init_node_lock_keys(int q)
+{
+}
+
static inline void init_lock_keys(void)
{
}
@@ -665,26 +697,6 @@ static inline void init_lock_keys(void)
static DEFINE_MUTEX(cache_chain_mutex);
static struct list_head cache_chain;
-/*
- * chicken and egg problem: delay the per-cpu array allocation
- * until the general caches are up.
- */
-static enum {
- NONE,
- PARTIAL_AC,
- PARTIAL_L3,
- EARLY,
- FULL
-} g_cpucache_up;
-
-/*
- * used by boot code to determine if it can use slab based allocator
- */
-int slab_is_available(void)
-{
- return g_cpucache_up >= EARLY;
-}
-
static DEFINE_PER_CPU(struct delayed_work, reap_work);
static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
@@ -1254,6 +1266,8 @@ static int __cpuinit cpuup_prepare(long cpu)
kfree(shared);
free_alien_cache(alien);
}
+ init_node_lock_keys(node);
+
return 0;
bad:
cpuup_canceled(cpu);
On Mon, Nov 23, 2009 at 10:01:15PM +0200, Pekka Enberg wrote:
> On Mon, 2009-11-23 at 21:50 +0200, Pekka Enberg wrote:
> > On Mon, 23 Nov 2009, Pekka Enberg wrote:
> > > > That turns out to be _very_ hard. How about something like the following
> > > > untested patch which delays slab_destroy() while we're under nc->lock.
> >
> > On Mon, 2009-11-23 at 13:30 -0600, Christoph Lameter wrote:
> > > Code changes to deal with a diagnostic issue?
> >
> > OK, fair enough. If I suffer permanent brain damage from staring at the
> > SLAB code for too long, I hope you and Matt will chip in to pay for my
> > medication.
> >
> > I think I was looking at the wrong thing here. The problem is in
> > cache_free_alien() so the comment in slab_destroy() isn't relevant.
> > Looking at init_lock_keys() we already do special lockdep annotations
> > but there's a catch (as explained in a comment on top of
> > on_slab_alc_key):
> >
> > * We set lock class for alien array caches which are up during init.
> > * The lock annotation will be lost if all cpus of a node goes down and
> > * then comes back up during hotplug
> >
> > Paul said he was running CPU hotplug so maybe that explains the problem?
>
> Maybe something like this untested patch fixes the issue...
I will give it a go!
Thanx, Paul
> Pekka
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 7dfa481..84de47e 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -604,6 +604,26 @@ static struct kmem_cache cache_cache = {
>
> #define BAD_ALIEN_MAGIC 0x01020304ul
>
> +/*
> + * chicken and egg problem: delay the per-cpu array allocation
> + * until the general caches are up.
> + */
> +static enum {
> + NONE,
> + PARTIAL_AC,
> + PARTIAL_L3,
> + EARLY,
> + FULL
> +} g_cpucache_up;
> +
> +/*
> + * used by boot code to determine if it can use slab based allocator
> + */
> +int slab_is_available(void)
> +{
> + return g_cpucache_up >= EARLY;
> +}
> +
> #ifdef CONFIG_LOCKDEP
>
> /*
> @@ -620,40 +640,52 @@ static struct kmem_cache cache_cache = {
> static struct lock_class_key on_slab_l3_key;
> static struct lock_class_key on_slab_alc_key;
>
> -static inline void init_lock_keys(void)
> -
> +static void init_node_lock_keys(int q)
> {
> - int q;
> struct cache_sizes *s = malloc_sizes;
>
> - while (s->cs_size != ULONG_MAX) {
> - for_each_node(q) {
> - struct array_cache **alc;
> - int r;
> - struct kmem_list3 *l3 = s->cs_cachep->nodelists[q];
> - if (!l3 || OFF_SLAB(s->cs_cachep))
> - continue;
> - lockdep_set_class(&l3->list_lock, &on_slab_l3_key);
> - alc = l3->alien;
> - /*
> - * FIXME: This check for BAD_ALIEN_MAGIC
> - * should go away when common slab code is taught to
> - * work even without alien caches.
> - * Currently, non NUMA code returns BAD_ALIEN_MAGIC
> - * for alloc_alien_cache,
> - */
> - if (!alc || (unsigned long)alc == BAD_ALIEN_MAGIC)
> - continue;
> - for_each_node(r) {
> - if (alc[r])
> - lockdep_set_class(&alc[r]->lock,
> - &on_slab_alc_key);
> - }
> + if (g_cpucache_up != FULL)
> + return;
> +
> + for (s = malloc_sizes; s->cs_size != ULONG_MAX; s++) {
> + struct array_cache **alc;
> + struct kmem_list3 *l3;
> + int r;
> +
> + l3 = s->cs_cachep->nodelists[q];
> + if (!l3 || OFF_SLAB(s->cs_cachep))
> + return;
> + lockdep_set_class(&l3->list_lock, &on_slab_l3_key);
> + alc = l3->alien;
> + /*
> + * FIXME: This check for BAD_ALIEN_MAGIC
> + * should go away when common slab code is taught to
> + * work even without alien caches.
> + * Currently, non NUMA code returns BAD_ALIEN_MAGIC
> + * for alloc_alien_cache,
> + */
> + if (!alc || (unsigned long)alc == BAD_ALIEN_MAGIC)
> + return;
> + for_each_node(r) {
> + if (alc[r])
> + lockdep_set_class(&alc[r]->lock,
> + &on_slab_alc_key);
> }
> - s++;
> }
> }
> +
> +static inline void init_lock_keys(void)
> +{
> + int node;
> +
> + for_each_node(node)
> + init_node_lock_keys(node);
> +}
> #else
> +static void init_node_lock_keys(int q)
> +{
> +}
> +
> static inline void init_lock_keys(void)
> {
> }
> @@ -665,26 +697,6 @@ static inline void init_lock_keys(void)
> static DEFINE_MUTEX(cache_chain_mutex);
> static struct list_head cache_chain;
>
> -/*
> - * chicken and egg problem: delay the per-cpu array allocation
> - * until the general caches are up.
> - */
> -static enum {
> - NONE,
> - PARTIAL_AC,
> - PARTIAL_L3,
> - EARLY,
> - FULL
> -} g_cpucache_up;
> -
> -/*
> - * used by boot code to determine if it can use slab based allocator
> - */
> -int slab_is_available(void)
> -{
> - return g_cpucache_up >= EARLY;
> -}
> -
> static DEFINE_PER_CPU(struct delayed_work, reap_work);
>
> static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
> @@ -1254,6 +1266,8 @@ static int __cpuinit cpuup_prepare(long cpu)
> kfree(shared);
> free_alien_cache(alien);
> }
> + init_node_lock_keys(node);
> +
> return 0;
> bad:
> cpuup_canceled(cpu);
>
>
On Mon, 2009-11-23 at 22:01 +0200, Pekka Enberg wrote:
> On Mon, 2009-11-23 at 21:50 +0200, Pekka Enberg wrote:
> > On Mon, 23 Nov 2009, Pekka Enberg wrote:
> > > > That turns out to be _very_ hard. How about something like the following
> > > > untested patch which delays slab_destroy() while we're under nc->lock.
> >
> > On Mon, 2009-11-23 at 13:30 -0600, Christoph Lameter wrote:
> > > Code changes to deal with a diagnostic issue?
> >
> > OK, fair enough. If I suffer permanent brain damage from staring at the
> > SLAB code for too long, I hope you and Matt will chip in to pay for my
> > medication.
You Europeans and your droll health care jokes.
> Maybe something like this untested patch fixes the issue...
This looks like a much better approach.
--
http://selenic.com : development and support for Mercurial and Linux
On Mon, Nov 23, 2009 at 09:00:00PM +0200, Pekka Enberg wrote:
> Hi Peter,
>
> On Fri, 2009-11-20 at 16:09 +0100, Peter Zijlstra wrote:
> > > Uh, ok, so apparently I was right after all. There's a comment in
> > > free_block() above the slab_destroy() call that refers to the comment
> > > above alloc_slabmgmt() function definition which explains it all.
> > >
> > > Long story short: ->slab_cachep never points to the same kmalloc cache
> > > we're allocating or freeing from. Where do we need to put the
> > > spin_lock_nested() annotation? Would it be enough to just use it in
> > > cache_free_alien() for alien->lock or do we need it in
> > > cache_flusharray() as well?
> >
> > You'd have to somehow push the nested state down from the
> > kmem_cache_free() call in slab_destroy() to all nc->lock sites below.
>
> That turns out to be _very_ hard. How about something like the following
> untested patch which delays slab_destroy() while we're under nc->lock.
>
> Pekka
Preliminary tests look good! The test was a ten-hour rcutorture run on
an 8-CPU Power system with a half-second delay between randomly chosen
CPU-hotplug operations. No lockdep warnings. ;-)
Will keep hammering on it.
Thanx, Paul
> diff --git a/mm/slab.c b/mm/slab.c
> index 7dfa481..6f522e3 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -316,7 +316,7 @@ struct kmem_list3 __initdata initkmem_list3[NUM_INIT_LISTS];
> static int drain_freelist(struct kmem_cache *cache,
> struct kmem_list3 *l3, int tofree);
> static void free_block(struct kmem_cache *cachep, void **objpp, int len,
> - int node);
> + int node, struct list_head *to_destroy);
> static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp);
> static void cache_reap(struct work_struct *unused);
>
> @@ -1002,7 +1002,8 @@ static void free_alien_cache(struct array_cache **ac_ptr)
> }
>
> static void __drain_alien_cache(struct kmem_cache *cachep,
> - struct array_cache *ac, int node)
> + struct array_cache *ac, int node,
> + struct list_head *to_destroy)
> {
> struct kmem_list3 *rl3 = cachep->nodelists[node];
>
> @@ -1016,12 +1017,22 @@ static void __drain_alien_cache(struct kmem_cache *cachep,
> if (rl3->shared)
> transfer_objects(rl3->shared, ac, ac->limit);
>
> - free_block(cachep, ac->entry, ac->avail, node);
> + free_block(cachep, ac->entry, ac->avail, node, to_destroy);
> ac->avail = 0;
> spin_unlock(&rl3->list_lock);
> }
> }
>
> +static void slab_destroy(struct kmem_cache *, struct slab *);
> +
> +static void destroy_slabs(struct kmem_cache *cache, struct list_head *to_destroy)
> +{
> + struct slab *slab, *tmp;
> +
> + list_for_each_entry_safe(slab, tmp, to_destroy, list)
> + slab_destroy(cache, slab);
> +}
> +
> /*
> * Called from cache_reap() to regularly drain alien caches round robin.
> */
> @@ -1033,8 +1044,11 @@ static void reap_alien(struct kmem_cache *cachep, struct kmem_list3 *l3)
> struct array_cache *ac = l3->alien[node];
>
> if (ac && ac->avail && spin_trylock_irq(&ac->lock)) {
> - __drain_alien_cache(cachep, ac, node);
> + LIST_HEAD(to_destroy);
> +
> + __drain_alien_cache(cachep, ac, node, &to_destroy);
> spin_unlock_irq(&ac->lock);
> + destroy_slabs(cachep, &to_destroy);
> }
> }
> }
> @@ -1049,9 +1063,12 @@ static void drain_alien_cache(struct kmem_cache *cachep,
> for_each_online_node(i) {
> ac = alien[i];
> if (ac) {
> + LIST_HEAD(to_destroy);
> +
> spin_lock_irqsave(&ac->lock, flags);
> - __drain_alien_cache(cachep, ac, i);
> + __drain_alien_cache(cachep, ac, i, &to_destroy);
> spin_unlock_irqrestore(&ac->lock, flags);
> + destroy_slabs(cachep, &to_destroy);
> }
> }
> }
> @@ -1076,17 +1093,20 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
> l3 = cachep->nodelists[node];
> STATS_INC_NODEFREES(cachep);
> if (l3->alien && l3->alien[nodeid]) {
> + LIST_HEAD(to_destroy);
> +
> alien = l3->alien[nodeid];
> spin_lock(&alien->lock);
> if (unlikely(alien->avail == alien->limit)) {
> STATS_INC_ACOVERFLOW(cachep);
> - __drain_alien_cache(cachep, alien, nodeid);
> + __drain_alien_cache(cachep, alien, nodeid, &to_destroy);
> }
> alien->entry[alien->avail++] = objp;
> spin_unlock(&alien->lock);
> + destroy_slabs(cachep, &to_destroy);
> } else {
> spin_lock(&(cachep->nodelists[nodeid])->list_lock);
> - free_block(cachep, &objp, 1, nodeid);
> + free_block(cachep, &objp, 1, nodeid, NULL);
> spin_unlock(&(cachep->nodelists[nodeid])->list_lock);
> }
> return 1;
> @@ -1118,7 +1138,7 @@ static void __cpuinit cpuup_canceled(long cpu)
> /* Free limit for this kmem_list3 */
> l3->free_limit -= cachep->batchcount;
> if (nc)
> - free_block(cachep, nc->entry, nc->avail, node);
> + free_block(cachep, nc->entry, nc->avail, node, NULL);
>
> if (!cpus_empty(*mask)) {
> spin_unlock_irq(&l3->list_lock);
> @@ -1128,7 +1148,7 @@ static void __cpuinit cpuup_canceled(long cpu)
> shared = l3->shared;
> if (shared) {
> free_block(cachep, shared->entry,
> - shared->avail, node);
> + shared->avail, node, NULL);
> l3->shared = NULL;
> }
>
> @@ -2402,7 +2422,7 @@ static void do_drain(void *arg)
> check_irq_off();
> ac = cpu_cache_get(cachep);
> spin_lock(&cachep->nodelists[node]->list_lock);
> - free_block(cachep, ac->entry, ac->avail, node);
> + free_block(cachep, ac->entry, ac->avail, node, NULL);
> spin_unlock(&cachep->nodelists[node]->list_lock);
> ac->avail = 0;
> }
> @@ -3410,7 +3430,7 @@ __cache_alloc(struct kmem_cache *cachep, gfp_t flags, void *caller)
> * Caller needs to acquire correct kmem_list's list_lock
> */
> static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
> - int node)
> + int node, struct list_head *to_destroy)
> {
> int i;
> struct kmem_list3 *l3;
> @@ -3439,7 +3459,10 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
> * a different cache, refer to comments before
> * alloc_slabmgmt.
> */
> - slab_destroy(cachep, slabp);
> + if (to_destroy)
> + list_add(&slabp->list, to_destroy);
> + else
> + slab_destroy(cachep, slabp);
> } else {
> list_add(&slabp->list, &l3->slabs_free);
> }
> @@ -3479,7 +3502,7 @@ static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac)
> }
> }
>
> - free_block(cachep, ac->entry, batchcount, node);
> + free_block(cachep, ac->entry, batchcount, node, NULL);
> free_done:
> #if STATS
> {
> @@ -3822,7 +3845,7 @@ static int alloc_kmemlist(struct kmem_cache *cachep, gfp_t gfp)
>
> if (shared)
> free_block(cachep, shared->entry,
> - shared->avail, node);
> + shared->avail, node, NULL);
>
> l3->shared = new_shared;
> if (!l3->alien) {
> @@ -3925,7 +3948,7 @@ static int do_tune_cpucache(struct kmem_cache *cachep, int limit,
> if (!ccold)
> continue;
> spin_lock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
> - free_block(cachep, ccold->entry, ccold->avail, cpu_to_node(i));
> + free_block(cachep, ccold->entry, ccold->avail, cpu_to_node(i), NULL);
> spin_unlock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
> kfree(ccold);
> }
> @@ -4007,7 +4030,7 @@ void drain_array(struct kmem_cache *cachep, struct kmem_list3 *l3,
> tofree = force ? ac->avail : (ac->limit + 4) / 5;
> if (tofree > ac->avail)
> tofree = (ac->avail + 1) / 2;
> - free_block(cachep, ac->entry, tofree, node);
> + free_block(cachep, ac->entry, tofree, node, NULL);
> ac->avail -= tofree;
> memmove(ac->entry, &(ac->entry[tofree]),
> sizeof(void *) * ac->avail);
>
>
On Mon, 2009-11-23 at 21:13 +0200, Pekka Enberg wrote:
> Matt Mackall wrote:
> > This seems like a lot of work to paper over a lockdep false positive in
> > code that should be firmly in the maintenance end of its lifecycle? I'd
> > rather the fix or papering over happen in lockdep.
>
> True that. Is __raw_spin_lock() out of question, Peter?-) Passing the
> state is pretty invasive because of the kmem_cache_free() call in
> slab_destroy(). We re-enter the slab allocator from the outer edges
> which makes spin_lock_nested() very inconvenient.
I'm perfectly fine with letting the thing be as it is, its apparently
not something that triggers very often, and since slab will be killed
off soon, who cares.
On Tue, Nov 24, 2009 at 05:33:26PM +0100, Peter Zijlstra wrote:
> On Mon, 2009-11-23 at 21:13 +0200, Pekka Enberg wrote:
> > Matt Mackall wrote:
> > > This seems like a lot of work to paper over a lockdep false positive in
> > > code that should be firmly in the maintenance end of its lifecycle? I'd
> > > rather the fix or papering over happen in lockdep.
> >
> > True that. Is __raw_spin_lock() out of question, Peter?-) Passing the
> > state is pretty invasive because of the kmem_cache_free() call in
> > slab_destroy(). We re-enter the slab allocator from the outer edges
> > which makes spin_lock_nested() very inconvenient.
>
> I'm perfectly fine with letting the thing be as it is, its apparently
> not something that triggers very often, and since slab will be killed
> off soon, who cares.
Which of the alternatives to slab should I be testing with, then?
[Ducks, runs away.]
Thanx, Paul
On Tue, 2009-11-24 at 09:00 -0800, Paul E. McKenney wrote:
> On Tue, Nov 24, 2009 at 05:33:26PM +0100, Peter Zijlstra wrote:
> > On Mon, 2009-11-23 at 21:13 +0200, Pekka Enberg wrote:
> > > Matt Mackall wrote:
> > > > This seems like a lot of work to paper over a lockdep false positive in
> > > > code that should be firmly in the maintenance end of its lifecycle? I'd
> > > > rather the fix or papering over happen in lockdep.
> > >
> > > True that. Is __raw_spin_lock() out of question, Peter?-) Passing the
> > > state is pretty invasive because of the kmem_cache_free() call in
> > > slab_destroy(). We re-enter the slab allocator from the outer edges
> > > which makes spin_lock_nested() very inconvenient.
> >
> > I'm perfectly fine with letting the thing be as it is, its apparently
> > not something that triggers very often, and since slab will be killed
> > off soon, who cares.
>
> Which of the alternatives to slab should I be testing with, then?
I'm guessing your system is in the minority that has more than $10 worth
of RAM, which means you should probably be evaluating SLUB.
--
http://selenic.com : development and support for Mercurial and Linux
On Tue, Nov 24, 2009 at 11:12:36AM -0600, Matt Mackall wrote:
> On Tue, 2009-11-24 at 09:00 -0800, Paul E. McKenney wrote:
> > On Tue, Nov 24, 2009 at 05:33:26PM +0100, Peter Zijlstra wrote:
> > > On Mon, 2009-11-23 at 21:13 +0200, Pekka Enberg wrote:
> > > > Matt Mackall wrote:
> > > > > This seems like a lot of work to paper over a lockdep false positive in
> > > > > code that should be firmly in the maintenance end of its lifecycle? I'd
> > > > > rather the fix or papering over happen in lockdep.
> > > >
> > > > True that. Is __raw_spin_lock() out of question, Peter?-) Passing the
> > > > state is pretty invasive because of the kmem_cache_free() call in
> > > > slab_destroy(). We re-enter the slab allocator from the outer edges
> > > > which makes spin_lock_nested() very inconvenient.
> > >
> > > I'm perfectly fine with letting the thing be as it is, its apparently
> > > not something that triggers very often, and since slab will be killed
> > > off soon, who cares.
> >
> > Which of the alternatives to slab should I be testing with, then?
>
> I'm guessing your system is in the minority that has more than $10 worth
> of RAM, which means you should probably be evaluating SLUB.
I have one nomination for SLUB. I have started a short test run.
Thanx, Paul
On Tue, 2009-11-24 at 11:12 -0600, Matt Mackall wrote:
> On Tue, 2009-11-24 at 09:00 -0800, Paul E. McKenney wrote:
> > On Tue, Nov 24, 2009 at 05:33:26PM +0100, Peter Zijlstra wrote:
> > > On Mon, 2009-11-23 at 21:13 +0200, Pekka Enberg wrote:
> > > > Matt Mackall wrote:
> > > > > This seems like a lot of work to paper over a lockdep false positive in
> > > > > code that should be firmly in the maintenance end of its lifecycle? I'd
> > > > > rather the fix or papering over happen in lockdep.
> > > >
> > > > True that. Is __raw_spin_lock() out of question, Peter?-) Passing the
> > > > state is pretty invasive because of the kmem_cache_free() call in
> > > > slab_destroy(). We re-enter the slab allocator from the outer edges
> > > > which makes spin_lock_nested() very inconvenient.
> > >
> > > I'm perfectly fine with letting the thing be as it is, its apparently
> > > not something that triggers very often, and since slab will be killed
> > > off soon, who cares.
> >
> > Which of the alternatives to slab should I be testing with, then?
>
> I'm guessing your system is in the minority that has more than $10 worth
> of RAM, which means you should probably be evaluating SLUB.
Well, I was rather hoping that'd die too ;-)
Weren't we going to go with SLQB?
On Tue, Nov 24, 2009 at 07:14:19PM +0100, Peter Zijlstra wrote:
> On Tue, 2009-11-24 at 11:12 -0600, Matt Mackall wrote:
> > On Tue, 2009-11-24 at 09:00 -0800, Paul E. McKenney wrote:
> > > On Tue, Nov 24, 2009 at 05:33:26PM +0100, Peter Zijlstra wrote:
> > > > On Mon, 2009-11-23 at 21:13 +0200, Pekka Enberg wrote:
> > > > > Matt Mackall wrote:
> > > > > > This seems like a lot of work to paper over a lockdep false positive in
> > > > > > code that should be firmly in the maintenance end of its lifecycle? I'd
> > > > > > rather the fix or papering over happen in lockdep.
> > > > >
> > > > > True that. Is __raw_spin_lock() out of question, Peter?-) Passing the
> > > > > state is pretty invasive because of the kmem_cache_free() call in
> > > > > slab_destroy(). We re-enter the slab allocator from the outer edges
> > > > > which makes spin_lock_nested() very inconvenient.
> > > >
> > > > I'm perfectly fine with letting the thing be as it is, its apparently
> > > > not something that triggers very often, and since slab will be killed
> > > > off soon, who cares.
> > >
> > > Which of the alternatives to slab should I be testing with, then?
> >
> > I'm guessing your system is in the minority that has more than $10 worth
> > of RAM, which means you should probably be evaluating SLUB.
>
> Well, I was rather hoping that'd die too ;-)
>
> Weren't we going to go with SLQB?
Well, I suppose I could make my scripts randomly choose the memory
allocator, but I would rather not. ;-)
More seriously, I do have a number of configurations that I test, and I
suppose I can chose different allocators for the different configurations.
Thanx, Paul
On Tue, 2009-11-24 at 10:25 -0800, Paul E. McKenney wrote:
> Well, I suppose I could make my scripts randomly choose the memory
> allocator, but I would rather not. ;-)
Which is why I hope we'll soon be down to 2, SLOB for tiny systems and
SLQB for the rest of us, having 3 in-tree and 1 pending is pure and
simple insanity.
Preferably SLQB will be small enough to also be able to get rid of SLOB,
but I've not recently seen any data on that particular issue.
On Tue, 24 Nov 2009, Peter Zijlstra wrote:
> On Tue, 2009-11-24 at 10:25 -0800, Paul E. McKenney wrote:
>
> > Well, I suppose I could make my scripts randomly choose the memory
> > allocator, but I would rather not. ;-)
>
> Which is why I hope we'll soon be down to 2, SLOB for tiny systems and
> SLQB for the rest of us, having 3 in-tree and 1 pending is pure and
> simple insanity.
>
> Preferably SLQB will be small enough to also be able to get rid of SLOB,
> but I've not recently seen any data on that particular issue.
We have some issues with NUMA in SLQB. Memoryless node support needs to
get some work. The fixes of memoryless node support to SLAB by Lee
create another case where SLQB will be regressing against SLAB.
Multiple modifications of per cpu variables in allocators other than SLUB
means that interruptless fastpath is going to be difficult to realize and
may continue to cause rt issues with preemption and per cpu handling.
Memory consumption of SLQB is better??
On Tue, Nov 24, 2009 at 07:31:51PM +0100, Peter Zijlstra wrote:
> On Tue, 2009-11-24 at 10:25 -0800, Paul E. McKenney wrote:
>
> > Well, I suppose I could make my scripts randomly choose the memory
> > allocator, but I would rather not. ;-)
>
> Which is why I hope we'll soon be down to 2, SLOB for tiny systems and
> SLQB for the rest of us, having 3 in-tree and 1 pending is pure and
> simple insanity.
So I should start specifying SLOB for my TINY_RCU tests, then.
> Preferably SLQB will be small enough to also be able to get rid of SLOB,
> but I've not recently seen any data on that particular issue.
Given the existence of TINY_RCU, I would look pretty funny if I insisted
on but a single implementation of core subsystems. ;-)
Thanx, Paul
On Tue, 2009-11-24 at 19:14 +0100, Peter Zijlstra wrote:
> On Tue, 2009-11-24 at 11:12 -0600, Matt Mackall wrote:
> > On Tue, 2009-11-24 at 09:00 -0800, Paul E. McKenney wrote:
> > > On Tue, Nov 24, 2009 at 05:33:26PM +0100, Peter Zijlstra wrote:
> > > > On Mon, 2009-11-23 at 21:13 +0200, Pekka Enberg wrote:
> > > > > Matt Mackall wrote:
> > > > > > This seems like a lot of work to paper over a lockdep false positive in
> > > > > > code that should be firmly in the maintenance end of its lifecycle? I'd
> > > > > > rather the fix or papering over happen in lockdep.
> > > > >
> > > > > True that. Is __raw_spin_lock() out of question, Peter?-) Passing the
> > > > > state is pretty invasive because of the kmem_cache_free() call in
> > > > > slab_destroy(). We re-enter the slab allocator from the outer edges
> > > > > which makes spin_lock_nested() very inconvenient.
> > > >
> > > > I'm perfectly fine with letting the thing be as it is, its apparently
> > > > not something that triggers very often, and since slab will be killed
> > > > off soon, who cares.
> > >
> > > Which of the alternatives to slab should I be testing with, then?
> >
> > I'm guessing your system is in the minority that has more than $10 worth
> > of RAM, which means you should probably be evaluating SLUB.
>
> Well, I was rather hoping that'd die too ;-)
>
> Weren't we going to go with SLQB?
News to me. Perhaps it was discussed at KS.
My understanding of the current state of play is:
SLUB: default allocator
SLAB: deep maintenance, will be removed if SLUB ever covers remaining
performance regressions
SLOB: useful for low-end (but high-volume!) embedded
SLQB: sitting in slab.git#for-next for months, has some ground to cover
SLQB and SLUB have pretty similar target audiences, so I agree we should
eventually have only one of them. But I strongly expect performance
results to be mixed, just as they have been comparing SLUB/SLAB.
Similarly, SLQB still has of room for tuning left compared to SLUB, as
SLUB did compared to SLAB when it first emerged. It might be a while
before a clear winner emerges.
--
http://selenic.com : development and support for Mercurial and Linux
On Tue, Nov 24, 2009 at 01:23:35PM -0600, Matt Mackall wrote:
> On Tue, 2009-11-24 at 19:14 +0100, Peter Zijlstra wrote:
> > On Tue, 2009-11-24 at 11:12 -0600, Matt Mackall wrote:
> > > On Tue, 2009-11-24 at 09:00 -0800, Paul E. McKenney wrote:
> > > > On Tue, Nov 24, 2009 at 05:33:26PM +0100, Peter Zijlstra wrote:
> > > > > On Mon, 2009-11-23 at 21:13 +0200, Pekka Enberg wrote:
> > > > > > Matt Mackall wrote:
> > > > > > > This seems like a lot of work to paper over a lockdep false positive in
> > > > > > > code that should be firmly in the maintenance end of its lifecycle? I'd
> > > > > > > rather the fix or papering over happen in lockdep.
> > > > > >
> > > > > > True that. Is __raw_spin_lock() out of question, Peter?-) Passing the
> > > > > > state is pretty invasive because of the kmem_cache_free() call in
> > > > > > slab_destroy(). We re-enter the slab allocator from the outer edges
> > > > > > which makes spin_lock_nested() very inconvenient.
> > > > >
> > > > > I'm perfectly fine with letting the thing be as it is, its apparently
> > > > > not something that triggers very often, and since slab will be killed
> > > > > off soon, who cares.
> > > >
> > > > Which of the alternatives to slab should I be testing with, then?
> > >
> > > I'm guessing your system is in the minority that has more than $10 worth
> > > of RAM, which means you should probably be evaluating SLUB.
> >
> > Well, I was rather hoping that'd die too ;-)
> >
> > Weren't we going to go with SLQB?
>
> News to me. Perhaps it was discussed at KS.
>
> My understanding of the current state of play is:
>
> SLUB: default allocator
Not on all architectures, it appears.
> SLAB: deep maintenance, will be removed if SLUB ever covers remaining
> performance regressions
;-)
> SLOB: useful for low-end (but high-volume!) embedded
And unfortunately also depends on CONFIG_EMBEDDED, making it difficult
for me to test on the available machines. My usual workaround is to
patch Kconfig to remove the dependency.
> SLQB: sitting in slab.git#for-next for months, has some ground to cover
I will hold off testing this until it hits mainline, especially if it is
where KS decided to go.
> SLQB and SLUB have pretty similar target audiences, so I agree we should
> eventually have only one of them. But I strongly expect performance
> results to be mixed, just as they have been comparing SLUB/SLAB.
> Similarly, SLQB still has of room for tuning left compared to SLUB, as
> SLUB did compared to SLAB when it first emerged. It might be a while
> before a clear winner emerges.
Those how live by the heuristic, die by the heuristic!!! ;-)
Thanx, Paul
On Tue, 2009-11-24 at 13:23 -0600, Matt Mackall wrote:
> My understanding of the current state of play is:
>
> SLUB: default allocator
> SLAB: deep maintenance, will be removed if SLUB ever covers remaining
> performance regressions
> SLOB: useful for low-end (but high-volume!) embedded
> SLQB: sitting in slab.git#for-next for months, has some ground to cover
>
> SLQB and SLUB have pretty similar target audiences, so I agree we should
> eventually have only one of them. But I strongly expect performance
> results to be mixed, just as they have been comparing SLUB/SLAB.
> Similarly, SLQB still has of room for tuning left compared to SLUB, as
> SLUB did compared to SLAB when it first emerged. It might be a while
> before a clear winner emerges.
And as long as we drag out this madness nothing will change I suspect.
On Tue, 2009-11-24 at 21:46 +0100, Peter Zijlstra wrote:
> On Tue, 2009-11-24 at 13:23 -0600, Matt Mackall wrote:
>
> > My understanding of the current state of play is:
> >
> > SLUB: default allocator
> > SLAB: deep maintenance, will be removed if SLUB ever covers remaining
> > performance regressions
> > SLOB: useful for low-end (but high-volume!) embedded
> > SLQB: sitting in slab.git#for-next for months, has some ground to cover
> >
> > SLQB and SLUB have pretty similar target audiences, so I agree we should
> > eventually have only one of them. But I strongly expect performance
> > results to be mixed, just as they have been comparing SLUB/SLAB.
> > Similarly, SLQB still has of room for tuning left compared to SLUB, as
> > SLUB did compared to SLAB when it first emerged. It might be a while
> > before a clear winner emerges.
>
> And as long as we drag out this madness nothing will change I suspect.
If there's a proposal here, it's not clear what it is.
--
http://selenic.com : development and support for Mercurial and Linux
On Tue, Nov 24, 2009 at 6:23 PM, Paul E. McKenney
<[email protected]> wrote:
> On Mon, Nov 23, 2009 at 09:00:00PM +0200, Pekka Enberg wrote:
>> Hi Peter,
>>
>> On Fri, 2009-11-20 at 16:09 +0100, Peter Zijlstra wrote:
>> > > Uh, ok, so apparently I was right after all. There's a comment in
>> > > free_block() above the slab_destroy() call that refers to the comment
>> > > above alloc_slabmgmt() function definition which explains it all.
>> > >
>> > > Long story short: ->slab_cachep never points to the same kmalloc cache
>> > > we're allocating or freeing from. Where do we need to put the
>> > > spin_lock_nested() annotation? Would it be enough to just use it in
>> > > cache_free_alien() for alien->lock or do we need it in
>> > > cache_flusharray() as well?
>> >
>> > You'd have to somehow push the nested state down from the
>> > kmem_cache_free() call in slab_destroy() to all nc->lock sites below.
>>
>> That turns out to be _very_ hard. How about something like the following
>> untested patch which delays slab_destroy() while we're under nc->lock.
>>
>> ? ? ? ? ? ? ? ? ? ? ? Pekka
>
> Preliminary tests look good! ?The test was a ten-hour rcutorture run on
> an 8-CPU Power system with a half-second delay between randomly chosen
> CPU-hotplug operations. ?No lockdep warnings. ?;-)
>
> Will keep hammering on it.
Thanks! Please let me know when you're hammered it enough :-). Peter,
may I have your ACK or NAK on the patch, please?
On Tue, 2009-11-24 at 14:53 -0600, Matt Mackall wrote:
> On Tue, 2009-11-24 at 21:46 +0100, Peter Zijlstra wrote:
> > On Tue, 2009-11-24 at 13:23 -0600, Matt Mackall wrote:
> >
> > > My understanding of the current state of play is:
> > >
> > > SLUB: default allocator
> > > SLAB: deep maintenance, will be removed if SLUB ever covers remaining
> > > performance regressions
> > > SLOB: useful for low-end (but high-volume!) embedded
> > > SLQB: sitting in slab.git#for-next for months, has some ground to cover
> > >
> > > SLQB and SLUB have pretty similar target audiences, so I agree we should
> > > eventually have only one of them. But I strongly expect performance
> > > results to be mixed, just as they have been comparing SLUB/SLAB.
> > > Similarly, SLQB still has of room for tuning left compared to SLUB, as
> > > SLUB did compared to SLAB when it first emerged. It might be a while
> > > before a clear winner emerges.
> >
> > And as long as we drag out this madness nothing will change I suspect.
>
> If there's a proposal here, it's not clear what it is.
Merge SLQB and rm mm/sl[ua]b.c include/linux/sl[ua]b.h for .33-rc1
As long as people have a choice they'll not even try new stuff and if
they do they'll change to the old one as soon as they find an issue, not
even bothering to report, let alone expend effort fixing it.
On Tue, 24 Nov 2009, Peter Zijlstra wrote:
> Merge SLQB and rm mm/sl[ua]b.c include/linux/sl[ua]b.h for .33-rc1
>
slqb still has a 5-10% performance regression compared to slab for
benchmarks such as netperf TCP_RR on machines with high cpu counts,
forcing that type of regression isn't acceptable.
On Tue, Nov 24, 2009 at 9:23 PM, Matt Mackall <[email protected]> wrote:
> On Tue, 2009-11-24 at 19:14 +0100, Peter Zijlstra wrote:
>> On Tue, 2009-11-24 at 11:12 -0600, Matt Mackall wrote:
>> > On Tue, 2009-11-24 at 09:00 -0800, Paul E. McKenney wrote:
>> > > On Tue, Nov 24, 2009 at 05:33:26PM +0100, Peter Zijlstra wrote:
>> > > > On Mon, 2009-11-23 at 21:13 +0200, Pekka Enberg wrote:
>> > > > > Matt Mackall wrote:
>> > > > > > This seems like a lot of work to paper over a lockdep false positive in
>> > > > > > code that should be firmly in the maintenance end of its lifecycle? I'd
>> > > > > > rather the fix or papering over happen in lockdep.
>> > > > >
>> > > > > True that. Is __raw_spin_lock() out of question, Peter?-) Passing the
>> > > > > state is pretty invasive because of the kmem_cache_free() call in
>> > > > > slab_destroy(). We re-enter the slab allocator from the outer edges
>> > > > > which makes spin_lock_nested() very inconvenient.
>> > > >
>> > > > I'm perfectly fine with letting the thing be as it is, its apparently
>> > > > not something that triggers very often, and since slab will be killed
>> > > > off soon, who cares.
>> > >
>> > > Which of the alternatives to slab should I be testing with, then?
>> >
>> > I'm guessing your system is in the minority that has more than $10 worth
>> > of RAM, which means you should probably be evaluating SLUB.
>>
>> Well, I was rather hoping that'd die too ;-)
>>
>> Weren't we going to go with SLQB?
>
> News to me. Perhaps it was discussed at KS.
Yes, we discussed this at KS. The plan was to merge SLQB to mainline
so people can test it more easily but unfortunately it hasn't gotten
any loving from Nick recently which makes me think it's going to miss
the merge window for .33 as well.
> My understanding of the current state of play is:
>
> SLUB: default allocator
> SLAB: deep maintenance, will be removed if SLUB ever covers remaining
> performance regressions
> SLOB: useful for low-end (but high-volume!) embedded
> SLQB: sitting in slab.git#for-next for months, has some ground to cover
>
> SLQB and SLUB have pretty similar target audiences, so I agree we should
> eventually have only one of them. But I strongly expect performance
> results to be mixed, just as they have been comparing SLUB/SLAB.
> Similarly, SLQB still has of room for tuning left compared to SLUB, as
> SLUB did compared to SLAB when it first emerged. It might be a while
> before a clear winner emerges.
Yeah, something like that. I don't think we were really able to decide
anything at the KS. IIRC Christoph was in favor of having multiple
slab allocators in the tree whereas I, for example, would rather have
only one. The SLOB allocator is bit special here because it's for
embedded. However, I also talked to some embedded folks at the summit
and none of them were using SLOB because the gains weren't big enough.
So I don't know if it's being used that widely.
I personally was hoping for SLUB or SLQB to emerge as a clear winner
so we could delete the rest but that hasn't really happened.
Pekka
On Tue, 2009-11-24 at 13:03 -0800, David Rientjes wrote:
> On Tue, 24 Nov 2009, Peter Zijlstra wrote:
>
> > Merge SLQB and rm mm/sl[ua]b.c include/linux/sl[ua]b.h for .33-rc1
> >
>
> slqb still has a 5-10% performance regression compared to slab for
> benchmarks such as netperf TCP_RR on machines with high cpu counts,
> forcing that type of regression isn't acceptable.
Having _4_ slab allocators is equally unacceptable.
On Tue, Nov 24, 2009 at 11:01 PM, Peter Zijlstra <[email protected]> wrote:
>> If there's a proposal here, it's not clear what it is.
>
> Merge SLQB and rm mm/sl[ua]b.c include/linux/sl[ua]b.h for .33-rc1
>
> As long as people have a choice they'll not even try new stuff and if
> they do they'll change to the old one as soon as they find an issue, not
> even bothering to report, let alone expend effort fixing it.
Oh, no, SLQB is by no means stable enough for the general public. And
it doesn't even have all the functionality SLAB and SLUB does (cpusets
come to mind).
If people want to really help us getting out of this mess, please take
a stab at fixing any of the outstanding performance regressions for
either SLQB or SLUB. David's a great source if you're interested in
knowing where to look. The only big regression for SLUB is the Intel
TPC benchmark thingy that nobody (except Intel folks) really has
access to. SLQB doesn't suffer from that because Nick had some
performance testing help from Intel IIRC.
On Tue, Nov 24, 2009 at 11:12 PM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2009-11-24 at 13:03 -0800, David Rientjes wrote:
>> On Tue, 24 Nov 2009, Peter Zijlstra wrote:
>>
>> > Merge SLQB and rm mm/sl[ua]b.c include/linux/sl[ua]b.h for .33-rc1
>> >
>>
>> slqb still has a 5-10% performance regression compared to slab for
>> benchmarks such as netperf TCP_RR on machines with high cpu counts,
>> forcing that type of regression isn't acceptable.
>
> Having _4_ slab allocators is equally unacceptable.
The whole idea behind merging SLQB is to see if it can replace SLAB.
If it can't do that in few kernel releases, we're pulling it out. It's
as simple as that.
And if SLQB can replace SLAB, then we start to talk about replacing SLUB too...
Pekka
On Tue, 24 Nov 2009, Peter Zijlstra wrote:
> > slqb still has a 5-10% performance regression compared to slab for
> > benchmarks such as netperf TCP_RR on machines with high cpu counts,
> > forcing that type of regression isn't acceptable.
>
> Having _4_ slab allocators is equally unacceptable.
>
So you just advocated to merging slqb so that it gets more testing and
development, and then use its inclusion in a statistic to say we should
remove others solely because the space is too cluttered?
We use slab partially because the regression in slub was too severe for
some of our benchmarks, and while CONFIG_SLUB may be the kernel default
there are still distros that use slab as the default as well. We cannot
simply remove an allocator that is superior to others because it is old or
has increased complexity.
I'd suggest looking at how widely used slob is and whether it has a
significant advantage over slub. We'd then have two allocators for
specialized workloads (and slub is much better for diagnostics) and one in
development.
On Tue, 2009-11-24 at 22:59 +0200, Pekka Enberg wrote:
> Thanks! Please let me know when you're hammered it enough :-). Peter,
> may I have your ACK or NAK on the patch, please?
Well, I'm not going to NAK it, for I think it does clean up that
recursion crap a little, but it should have more merit that
side-stepping lockdep.
If you too feel it make SLAB ever so slightly more palatable then ACK,
otherwise I'm perfectly fine with letting SLAB bitrot.
On Tue, 2009-11-24 at 13:22 -0800, David Rientjes wrote:
> On Tue, 24 Nov 2009, Peter Zijlstra wrote:
>
> > > slqb still has a 5-10% performance regression compared to slab for
> > > benchmarks such as netperf TCP_RR on machines with high cpu counts,
> > > forcing that type of regression isn't acceptable.
> >
> > Having _4_ slab allocators is equally unacceptable.
> >
>
> So you just advocated to merging slqb so that it gets more testing and
> development, and then use its inclusion in a statistic to say we should
> remove others solely because the space is too cluttered?
We should cull something, just merging more and more of them is useless
and wastes everybody's time since you have to add features and
interfaces to all of them.
> We use slab partially because the regression in slub was too severe for
> some of our benchmarks, and while CONFIG_SLUB may be the kernel default
> there are still distros that use slab as the default as well. We cannot
> simply remove an allocator that is superior to others because it is old or
> has increased complexity.
Then maybe we should toss SLUB? But then there's people who say SLUB is
better for them. Without forcing something to happen we'll be stuck with
multiple allocators forever.
On Tue, 24 Nov 2009, Peter Zijlstra wrote:
> We should cull something, just merging more and more of them is useless
> and wastes everybody's time since you have to add features and
> interfaces to all of them.
>
I agree, but it's difficult to get widespread testing or development
interest in an allocator that is sitting outside of mainline. I don't
think any allocator could suddenly be merged as the kernel default, it
seems like a prerequisite to go through the preliminary merging and
development. The severe netperf TCP_RR regression that slub has compared
to slab was never found before it became the default allocator, otherwise
there would probably have been more effort into its development as well.
Unfortunately, slub's design is such that it will probably never be able
to nullify the partial slab thrashing enough, even with the percpu counter
speedup that is now available because of Christoph's work, to make TCP_RR
perform as well as slab.
> Then maybe we should toss SLUB? But then there's people who say SLUB is
> better for them. Without forcing something to happen we'll be stuck with
> multiple allocators forever.
>
Slub is definitely superior in diagnostics and is a much simpler design
than slab. I think it would be much easier to remove slub than slab,
though, simply because there are no great slab performance degradations
compared to slub. I think the best candidate for removal might be slob,
however, because it hasn't been compared to slub and usage may not be as
widespread as expected for such a special case allocator.
On Tue, Nov 24, 2009 at 10:59:44PM +0200, Pekka Enberg wrote:
> On Tue, Nov 24, 2009 at 6:23 PM, Paul E. McKenney
> <[email protected]> wrote:
> > On Mon, Nov 23, 2009 at 09:00:00PM +0200, Pekka Enberg wrote:
> >> Hi Peter,
> >>
> >> On Fri, 2009-11-20 at 16:09 +0100, Peter Zijlstra wrote:
> >> > > Uh, ok, so apparently I was right after all. There's a comment in
> >> > > free_block() above the slab_destroy() call that refers to the comment
> >> > > above alloc_slabmgmt() function definition which explains it all.
> >> > >
> >> > > Long story short: ->slab_cachep never points to the same kmalloc cache
> >> > > we're allocating or freeing from. Where do we need to put the
> >> > > spin_lock_nested() annotation? Would it be enough to just use it in
> >> > > cache_free_alien() for alien->lock or do we need it in
> >> > > cache_flusharray() as well?
> >> >
> >> > You'd have to somehow push the nested state down from the
> >> > kmem_cache_free() call in slab_destroy() to all nc->lock sites below.
> >>
> >> That turns out to be _very_ hard. How about something like the following
> >> untested patch which delays slab_destroy() while we're under nc->lock.
> >>
> >> ? ? ? ? ? ? ? ? ? ? ? Pekka
> >
> > Preliminary tests look good! ?The test was a ten-hour rcutorture run on
> > an 8-CPU Power system with a half-second delay between randomly chosen
> > CPU-hotplug operations. ?No lockdep warnings. ?;-)
> >
> > Will keep hammering on it.
>
> Thanks! Please let me know when you're hammered it enough :-). Peter,
> may I have your ACK or NAK on the patch, please?
I expect to hammer it over the USA Thanksgiving holiday Thu-Sun this week.
It is like this, Pekka: since I don't drink, it is instead your code
that is going to get hammered this weekend!
Thanx, Paul
On Tue, Nov 24, 2009 at 10:12:30PM +0100, Peter Zijlstra wrote:
> On Tue, 2009-11-24 at 13:03 -0800, David Rientjes wrote:
> > On Tue, 24 Nov 2009, Peter Zijlstra wrote:
> >
> > > Merge SLQB and rm mm/sl[ua]b.c include/linux/sl[ua]b.h for .33-rc1
> > >
> >
> > slqb still has a 5-10% performance regression compared to slab for
> > benchmarks such as netperf TCP_RR on machines with high cpu counts,
> > forcing that type of regression isn't acceptable.
>
> Having _4_ slab allocators is equally unacceptable.
I completely agree. We need at least ten. ;-)
Thanx, Paul
On Tue, Nov 24, 2009 at 01:46:34PM -0800, David Rientjes wrote:
> On Tue, 24 Nov 2009, Peter Zijlstra wrote:
>
> > We should cull something, just merging more and more of them is useless
> > and wastes everybody's time since you have to add features and
> > interfaces to all of them.
>
> I agree, but it's difficult to get widespread testing or development
> interest in an allocator that is sitting outside of mainline. I don't
> think any allocator could suddenly be merged as the kernel default, it
> seems like a prerequisite to go through the preliminary merging and
> development. The severe netperf TCP_RR regression that slub has compared
> to slab was never found before it became the default allocator, otherwise
> there would probably have been more effort into its development as well.
> Unfortunately, slub's design is such that it will probably never be able
> to nullify the partial slab thrashing enough, even with the percpu counter
> speedup that is now available because of Christoph's work, to make TCP_RR
> perform as well as slab.
OK. I threatened this over IRC, and I never make threats that I am not
prepared to carry out.
I therefore propose creating a staging area for memory allocators,
similar to the one for device drivers. Have it in place for allocators
both coming and going.
> > Then maybe we should toss SLUB? But then there's people who say SLUB is
> > better for them. Without forcing something to happen we'll be stuck with
> > multiple allocators forever.
>
> Slub is definitely superior in diagnostics and is a much simpler design
> than slab. I think it would be much easier to remove slub than slab,
> though, simply because there are no great slab performance degradations
> compared to slub. I think the best candidate for removal might be slob,
> however, because it hasn't been compared to slub and usage may not be as
> widespread as expected for such a special case allocator.
And yes, the real problem is that each allocator has its advocates.
I would actually not be all that worried about a proliferation of
allocators if they were automatically selected based on machine
configuration, expected workload, or some such. But the fact is
that while 5% is a life-or-death matter to benchmarkers, it is of no
consequence to the typical Linux user/workload.
The concern with simpler allocators is that making them competitive
across the board with SLAB will make them just as complex as SLAB is.
As long as CONFIG_EMBEDDED remains a euphemism for "don't use me", SLOB
will not see much use or testing outside of those people who care
passionately about memory footprint. SLQB probably doesn't make it into
mainline until either Nick gets done with his VFS scalability work or
someone else starts pushing it. Allocator proliferation continues as
long as allocators are perceived to be easy to write. And so on...
As for me, as long as SLAB is in the kernel and is default for some
of the machines I use for testing, I will continue reporting any bugs
I find in it. ;-)
Thanx, Paul
On Tue, 2009-11-24 at 23:07 +0200, Pekka Enberg wrote:
> On Tue, Nov 24, 2009 at 9:23 PM, Matt Mackall <[email protected]> wrote:
> > On Tue, 2009-11-24 at 19:14 +0100, Peter Zijlstra wrote:
> >> On Tue, 2009-11-24 at 11:12 -0600, Matt Mackall wrote:
> >> > On Tue, 2009-11-24 at 09:00 -0800, Paul E. McKenney wrote:
> >> > > On Tue, Nov 24, 2009 at 05:33:26PM +0100, Peter Zijlstra wrote:
> >> > > > On Mon, 2009-11-23 at 21:13 +0200, Pekka Enberg wrote:
> >> > > > > Matt Mackall wrote:
> >> > > > > > This seems like a lot of work to paper over a lockdep false positive in
> >> > > > > > code that should be firmly in the maintenance end of its lifecycle? I'd
> >> > > > > > rather the fix or papering over happen in lockdep.
> >> > > > >
> >> > > > > True that. Is __raw_spin_lock() out of question, Peter?-) Passing the
> >> > > > > state is pretty invasive because of the kmem_cache_free() call in
> >> > > > > slab_destroy(). We re-enter the slab allocator from the outer edges
> >> > > > > which makes spin_lock_nested() very inconvenient.
> >> > > >
> >> > > > I'm perfectly fine with letting the thing be as it is, its apparently
> >> > > > not something that triggers very often, and since slab will be killed
> >> > > > off soon, who cares.
> >> > >
> >> > > Which of the alternatives to slab should I be testing with, then?
> >> >
> >> > I'm guessing your system is in the minority that has more than $10 worth
> >> > of RAM, which means you should probably be evaluating SLUB.
> >>
> >> Well, I was rather hoping that'd die too ;-)
> >>
> >> Weren't we going to go with SLQB?
> >
> > News to me. Perhaps it was discussed at KS.
>
> Yes, we discussed this at KS. The plan was to merge SLQB to mainline
> so people can test it more easily but unfortunately it hasn't gotten
> any loving from Nick recently which makes me think it's going to miss
> the merge window for .33 as well.
>
> > My understanding of the current state of play is:
> >
> > SLUB: default allocator
> > SLAB: deep maintenance, will be removed if SLUB ever covers remaining
> > performance regressions
> > SLOB: useful for low-end (but high-volume!) embedded
> > SLQB: sitting in slab.git#for-next for months, has some ground to cover
> >
> > SLQB and SLUB have pretty similar target audiences, so I agree we should
> > eventually have only one of them. But I strongly expect performance
> > results to be mixed, just as they have been comparing SLUB/SLAB.
> > Similarly, SLQB still has of room for tuning left compared to SLUB, as
> > SLUB did compared to SLAB when it first emerged. It might be a while
> > before a clear winner emerges.
>
> Yeah, something like that. I don't think we were really able to decide
> anything at the KS. IIRC Christoph was in favor of having multiple
> slab allocators in the tree whereas I, for example, would rather have
> only one. The SLOB allocator is bit special here because it's for
> embedded. However, I also talked to some embedded folks at the summit
> and none of them were using SLOB because the gains weren't big enough.
> So I don't know if it's being used that widely.
I'm afraid I have only anecdotal reports from SLOB users, and embedded
folks are notorious for lack of feedback, but I only need a few people
to tell me they're shipping 100k units/mo to be confident that SLOB is
in use in millions of devices.
--
http://selenic.com : development and support for Mercurial and Linux
Paul E. McKenney kirjoitti:
> As for me, as long as SLAB is in the kernel and is default for some
> of the machines I use for testing, I will continue reporting any bugs
> I find in it. ;-)
Yes, thanks for doing that. As long as SLAB is in the tree, I'll do my
best to get them fixed.
Pekka
Peter Zijlstra kirjoitti:
> Then maybe we should toss SLUB? But then there's people who say SLUB is
> better for them. Without forcing something to happen we'll be stuck with
> multiple allocators forever.
SLUB is good for NUMA, SLAB is pretty much a disaster with it's alien
tentacles^Hcaches. AFAIK, SLQB hasn't received much NUMA attention so
it's not obvious whether or not it will be able to perform as well as
SLUB or not.
The biggest problem with SLUB is that most of the people (excluding
Christoph and myself) seem to think the design is unfixable for their
favorite workload so they prefer to either stay with SLAB or work on SLQB.
I really couldn't care less which allocator we end up with as long as
it's not SLAB. I do think putting more performance tuning effort into
SLUB would give best results because the allocator is pretty rock solid
at this point. People seem underestimate the total effort needed to make
a slab allocator good enough for the general public (which is why I
think SLQB still has a long way to go).
Pekka
Peter Zijlstra kirjoitti:
> On Tue, 2009-11-24 at 22:59 +0200, Pekka Enberg wrote:
>
>> Thanks! Please let me know when you're hammered it enough :-). Peter,
>> may I have your ACK or NAK on the patch, please?
>
> Well, I'm not going to NAK it, for I think it does clean up that
> recursion crap a little, but it should have more merit that
> side-stepping lockdep.
>
> If you too feel it make SLAB ever so slightly more palatable then ACK,
> otherwise I'm perfectly fine with letting SLAB bitrot.
I'll take that as an ACK ;-) Thanks!
On Tue, 24 Nov 2009, Matt Mackall wrote:
> I'm afraid I have only anecdotal reports from SLOB users, and embedded
> folks are notorious for lack of feedback, but I only need a few people
> to tell me they're shipping 100k units/mo to be confident that SLOB is
> in use in millions of devices.
>
It's much more popular than I had expected; do you think it would be
possible to merge slob's core into another allocator or will it require
seperation forever?
On Wed, 2009-11-25 at 13:59 -0800, David Rientjes wrote:
> On Tue, 24 Nov 2009, Matt Mackall wrote:
>
> > I'm afraid I have only anecdotal reports from SLOB users, and embedded
> > folks are notorious for lack of feedback, but I only need a few people
> > to tell me they're shipping 100k units/mo to be confident that SLOB is
> > in use in millions of devices.
> >
>
> It's much more popular than I had expected; do you think it would be
> possible to merge slob's core into another allocator or will it require
> seperation forever?
Probably not. It's actually a completely different kind of allocator
than the rest as it doesn't actually use "slabs" at all. It's instead a
slab-like interface on a traditional heap allocator. SLAB/SLUB/SLQB have
much more in common - their biggest differences are about their approach
to scalability/locking issues.
On the upside, SLOB is easily the simplest of the bunch.
--
http://selenic.com : development and support for Mercurial and Linux
On Wed, 25 Nov 2009, Pekka Enberg wrote:
> SLUB is good for NUMA, SLAB is pretty much a disaster with it's alien
> tentacles^Hcaches. AFAIK, SLQB hasn't received much NUMA attention so it's not
> obvious whether or not it will be able to perform as well as SLUB or not.
>
> The biggest problem with SLUB is that most of the people (excluding Christoph
> and myself) seem to think the design is unfixable for their favorite workload
> so they prefer to either stay with SLAB or work on SLQB.
The current design of each has its own strength and its weaknesses. A
queued design is not good for HPC and financial apps since it requires
periodic queue cleaning (therefore disturbing a latency critical
application path). Queue processing can go out of hand if there are
many different types of memory (SLAB in NUMA configurations). So a
queueless allocator design is good for some configurations. It is also
beneficial if the allocator must be frugal with memory allocations.
There is not much difference for most workloads in terms of memory
consumption between SLOB and SLUB.
> I really couldn't care less which allocator we end up with as long as it's not
> SLAB. I do think putting more performance tuning effort into SLUB would give
> best results because the allocator is pretty rock solid at this point. People
> seem underestimate the total effort needed to make a slab allocator good
> enough for the general public (which is why I think SLQB still has a long way
> to go).
There are still patches queued here for SLUB that depend on other per cpu
work to be merged in .33. These do not address the caching issues that
people focus on for networking and enterprise apps but they decrease the
minimum latency important for HPC and financial apps. The SLUB fastpath is
the lowest latency allocation path that exists.
On Tue, 24 Nov 2009, Pekka Enberg wrote:
> Yeah, something like that. I don't think we were really able to decide
> anything at the KS. IIRC Christoph was in favor of having multiple
> slab allocators in the tree whereas I, for example, would rather have
> only one. The SLOB allocator is bit special here because it's for
> embedded. However, I also talked to some embedded folks at the summit
> and none of them were using SLOB because the gains weren't big enough.
> So I don't know if it's being used that widely.
Are there any current numbers on SLOB memory advantage vs the other
allcoators?
> I personally was hoping for SLUB or SLQB to emerge as a clear winner
> so we could delete the rest but that hasn't really happened.
I think having multiple allocators makes for a heathly competition between
them and stabillizes the allocator API. Frankly I would like to see more
exchangable subsystems in the core. The scheduler seems to be not
competitive for my current workloads running on 2.6.22 (we have not tried
2.6.32 yet) and I have a lot of concerns about the continual performance
deteriorations in the page allocator and the reclaim logic due to feature
bloat.
On Wed, 25 Nov 2009, David Rientjes wrote:
> On Tue, 24 Nov 2009, Matt Mackall wrote:
>
> > I'm afraid I have only anecdotal reports from SLOB users, and embedded
> > folks are notorious for lack of feedback, but I only need a few people
> > to tell me they're shipping 100k units/mo to be confident that SLOB is
> > in use in millions of devices.
> >
>
> It's much more popular than I had expected; do you think it would be
> possible to merge slob's core into another allocator or will it require
> seperation forever?
It would be possible to create a slab-common.c and isolate common handling
of all allocators. SLUB and SLQB share quite a lot of code and SLAB could
be cleaned up and made to fit into such a framework.
On Tue, Nov 24, 2009 at 01:47:40PM -0800, Paul E. McKenney wrote:
> On Tue, Nov 24, 2009 at 10:59:44PM +0200, Pekka Enberg wrote:
> > On Tue, Nov 24, 2009 at 6:23 PM, Paul E. McKenney
> > <[email protected]> wrote:
> > > On Mon, Nov 23, 2009 at 09:00:00PM +0200, Pekka Enberg wrote:
> > >> Hi Peter,
> > >>
> > >> On Fri, 2009-11-20 at 16:09 +0100, Peter Zijlstra wrote:
> > >> > > Uh, ok, so apparently I was right after all. There's a comment in
> > >> > > free_block() above the slab_destroy() call that refers to the comment
> > >> > > above alloc_slabmgmt() function definition which explains it all.
> > >> > >
> > >> > > Long story short: ->slab_cachep never points to the same kmalloc cache
> > >> > > we're allocating or freeing from. Where do we need to put the
> > >> > > spin_lock_nested() annotation? Would it be enough to just use it in
> > >> > > cache_free_alien() for alien->lock or do we need it in
> > >> > > cache_flusharray() as well?
> > >> >
> > >> > You'd have to somehow push the nested state down from the
> > >> > kmem_cache_free() call in slab_destroy() to all nc->lock sites below.
> > >>
> > >> That turns out to be _very_ hard. How about something like the following
> > >> untested patch which delays slab_destroy() while we're under nc->lock.
> > >>
> > >> ? ? ? ? ? ? ? ? ? ? ? Pekka
> > >
> > > Preliminary tests look good! ?The test was a ten-hour rcutorture run on
> > > an 8-CPU Power system with a half-second delay between randomly chosen
> > > CPU-hotplug operations. ?No lockdep warnings. ?;-)
> > >
> > > Will keep hammering on it.
> >
> > Thanks! Please let me know when you're hammered it enough :-). Peter,
> > may I have your ACK or NAK on the patch, please?
>
> I expect to hammer it over the USA Thanksgiving holiday Thu-Sun this week.
> It is like this, Pekka: since I don't drink, it is instead your code
> that is going to get hammered this weekend!
And the runs with your patch were free from lockdep complaints, while
those runs lacking your patch did raise lockdep's ire. So feel free to
add "Tested-by: Paul E. McKenney <[email protected]>" to the
patch.
And thank you for the fix!!!
Thanx, Paul
On Fri, 27 Nov 2009, Christoph Lameter wrote:
> > > I'm afraid I have only anecdotal reports from SLOB users, and embedded
> > > folks are notorious for lack of feedback, but I only need a few people
> > > to tell me they're shipping 100k units/mo to be confident that SLOB is
> > > in use in millions of devices.
> > >
> >
> > It's much more popular than I had expected; do you think it would be
> > possible to merge slob's core into another allocator or will it require
> > seperation forever?
>
> It would be possible to create a slab-common.c and isolate common handling
> of all allocators. SLUB and SLQB share quite a lot of code and SLAB could
> be cleaned up and made to fit into such a framework.
>
Right, but the user is still left with a decision of which slab allocator
to compile into their kernel, each with distinct advantages and
disadvantages that get exploited for the wide range of workloads that it
runs. If slob could be merged into another allocator, it would be simple
to remove the distinction of it being seperate altogether, the differences
would depend on CONFIG_EMBEDDED instead.
On Mon, 2009-11-30 at 15:14 -0800, David Rientjes wrote:
> On Fri, 27 Nov 2009, Christoph Lameter wrote:
>
> > > > I'm afraid I have only anecdotal reports from SLOB users, and embedded
> > > > folks are notorious for lack of feedback, but I only need a few people
> > > > to tell me they're shipping 100k units/mo to be confident that SLOB is
> > > > in use in millions of devices.
> > > >
> > >
> > > It's much more popular than I had expected; do you think it would be
> > > possible to merge slob's core into another allocator or will it require
> > > seperation forever?
> >
> > It would be possible to create a slab-common.c and isolate common handling
> > of all allocators. SLUB and SLQB share quite a lot of code and SLAB could
> > be cleaned up and made to fit into such a framework.
> >
>
> Right, but the user is still left with a decision of which slab allocator
> to compile into their kernel, each with distinct advantages and
> disadvantages that get exploited for the wide range of workloads that it
> runs. If slob could be merged into another allocator, it would be simple
> to remove the distinction of it being seperate altogether, the differences
> would depend on CONFIG_EMBEDDED instead.
No no no wrong wrong wrong. Again, SLOB is the least mergeable of the
set. It has vastly different priorities, design, and code from the rest.
Literally the only thing it has in common with the other three is the
interface.
And it's not even something that -most- of embedded devices will want to
use, so it can't be keyed off CONFIG_EMBEDDED anyway. If you've got even
16MB of memory, you probably want to use a SLAB-like allocator (ie not
SLOB). But there are -millions- of devices being shipped that don't have
that much memory, a situation that's likely to continue until you can
fit a larger Linux system entirely in a <$1 microcontroller-sized device
(probably 5 years off still).
This thread is annoying. The problem that triggered this thread is not
in SLOB/SLUB/SLQB, nor even in our bog-standard 10yo deep-maintenance
known-to-work SLAB code. The problem was a FALSE POSITIVE from lockdep
on code that PREDATES lockdep itself. There is nothing in this thread to
indicate that there is a serious problem maintaining multiple
allocators. In fact, considerably more time has been spent (as usual)
debating non-existent problems than fixing real ones.
I agree that having only one of SLAB/SLUB/SLQB would be nice, but it's
going to take a lot of heavy lifting in the form of hacking and
benchmarking to have confidence that there's a clear performance winner.
Given the multiple dimensions of performance
(scalability/throughput/latency for starters), I don't even think
there's good a priori reason to believe that a clear winner CAN exist.
SLUB may always have better latency, and SLQB may always have better
throughput. If you're NYSE, you might have different performance
priorities than if you're Google or CERN or Sony that amount to millions
of dollars. Repeatedly saying "but we should have only one allocator"
isn't going to change that.
--
http://selenic.com : development and support for Mercurial and Linux
On Mon, 30 Nov 2009, David Rientjes wrote:
> Right, but the user is still left with a decision of which slab allocator
> to compile into their kernel, each with distinct advantages and
> disadvantages that get exploited for the wide range of workloads that it
> runs. If slob could be merged into another allocator, it would be simple
> to remove the distinction of it being seperate altogether, the differences
> would depend on CONFIG_EMBEDDED instead.
No embedded folks that I know are using SLOB. CONFIG_EMBEDDED still would
require a selection of allocators. I have no direct knowledge of anyone
using SLOB (despite traveling widely this year) aside from what Matt tells
me.
On Mon, 30 Nov 2009, Matt Mackall wrote:
> And it's not even something that -most- of embedded devices will want to
> use, so it can't be keyed off CONFIG_EMBEDDED anyway. If you've got even
> 16MB of memory, you probably want to use a SLAB-like allocator (ie not
> SLOB). But there are -millions- of devices being shipped that don't have
> that much memory, a situation that's likely to continue until you can
> fit a larger Linux system entirely in a <$1 microcontroller-sized device
> (probably 5 years off still).
>
What qualifying criteria can we use to automatically select slob for a
kernel or the disqualifying criteria to automatically select slub as a
default, then? It currently depends on CONFIG_EMBEDDED, but it still
requires the user to specifically chose the allocator over another. Could
we base this decision on another config option enabled for systems with
less than 16MB?
> This thread is annoying. The problem that triggered this thread is not
> in SLOB/SLUB/SLQB, nor even in our bog-standard 10yo deep-maintenance
> known-to-work SLAB code. The problem was a FALSE POSITIVE from lockdep
> on code that PREDATES lockdep itself. There is nothing in this thread to
> indicate that there is a serious problem maintaining multiple
> allocators. In fact, considerably more time has been spent (as usual)
> debating non-existent problems than fixing real ones.
>
We could move the discussion on the long-term maintainable aspects of
multiple slab allocators to a new thread if you'd like.