2009-06-03 04:06:55

by Linus Torvalds

[permalink] [raw]
Subject: Linux 2.6.30-rc8


This is almost certainly the last -rc, and I debated even doing it. But
rather than just do 2.6.30, I decided that I'd be better off doing a last
-rc8 and then a real release probably this weekend.

This has mostly driver and arch updates, with perhaps the intel drm/kms
and network driver changes standing out, but there's powerpc, blackfin and
arm updates too. Admittedly, the two biggest parts of the powerpc update
are a revert and a defconfig update.

A lot of small stuff, fixing a few regressions (and at least one bugzilla
entry going back to 2.6.24). The small stuff does matter. Please test.

Linus

---
Alan Cox (3):
8250: Fix oops from setserial
pata_netcell: LBA48 force identify bits correct
parport: quickfix the proc registration bug

Alan Stern (1):
usb-serial: fix crash when sub-driver updates firmware

Alex Chiang (2):
atlx: move modinfo data from atlx.h to atl1.c
PCI Hotplug: acpiphp: don't store a pci_dev in acpiphp_func

Alex Riesen (1):
Use a format for linux_banner

Alexander Beregalov (2):
parport_gsc: fix printk format error
serial: 8250_gsc: fix printk format error

Alexey Dobriyan (1):
cred: #include init.h in cred.h

Andreas Herrmann (1):
[CPUFREQ] powernow-k8: determine exact CPU frequency for HW Pstates

Andrew Morton (1):
sysfs: file.c: use create_singlethread_workqueue()

Avi Kivity (2):
KVM: Make paravirt tlb flush also reload the PAE PDPTRs
KVM: Fix PDPTR reloading on CR4 writes

Bartlomiej Zolnierkiewicz (1):
ide_pci_generic: add quirk for Netcell ATA RAID

Benjamin Herrenschmidt (5):
Revert "powerpc: Rework dma-noncoherent to use generic vmalloc layer"
powerpc: Move dma-noncoherent.c from arch/powerpc/lib to arch/powerpc/mm
powerpc: Minor cleanups of kernel virt address space definitions
powerpc: Fix up dma_alloc_coherent() on platforms without cache coherency.
powerpc/pmac: Update PowerMac 32-bit defconfig

Christian Engelmayer (1):
hwmon: Update documentation on fan_max

Clemens Ladisch (1):
sound: usb-audio: make the MotU Fastlane work again

Coly Li (1):
[ARM] pxa: add parameter to clksrc_read() for pxa168/910

Daisuke Nishimura (1):
memcg: fix deadlock between lock_page_cgroup and mapping tree_lock

Dan Carpenter (1):
RxRPC: Error handling for rxrpc_alloc_connection()

Daniel Ribeiro (3):
[ARM] pxa: save/restore PGSR on suspend/resume.
[ARM] pxa: allow gpio_reset drive high during normal work
[ARM] pxa/ezx: fix pin configuration for low power mode

Dave Jones (1):
[CPUFREQ] powernow-k7 build fix when ACPI=n

Dave Young (1):
Bluetooth: Remove useless flush_work() causing lockdep warnings

David Dillow (1):
r8169: avoid losing MSI interrupts

David Howells (2):
FS-Cache: Fixup renamed filenames in comments in internal.h
CacheFiles: Fixup renamed filenames in comments in internal.h

David Rientjes (1):
oom: fix possible oom_dump_tasks NULL pointer

David S. Miller (2):
sparc64: Fix SET_PERSONALITY to not clip bits outside of PER_MASK.
sparc64: Fix section attribute warnings.

Divy Le Ray (2):
cxgb3: fix dma mapping regression
cxgb3: link fault fixes

Dmitry Eremin-Solenikov (1):
[ARM] pxa/spitz: provide spitz_ohci_exit() that unregisters USB_HOST GPIO

Dmitry Torokhov (1):
Input: libps2 - better handle bad scheduler decisions

Doug Leith (1):
tcp: tcp_vegas ssthresh bugfix

Ed Swierk (1):
forcedeth: add phy_power_down parameter, leave phy powered up by default (v2)

Eric Anholt (2):
drm/i915: Fix tiling pitch handling on 8xx.
drm/i915: Apply a big hammer to 865 GEM object CPU cache flushing.

Eric Dumazet (2):
net: fix length computation in rt_check_expire()
net: fix rtable leak in net/ipv4/route.c

Eric Sandeen (1):
xfs: fix overflow in xfs_growfs_data_private

Fabio Rossi (1):
ath5k: fix interpolation with equal power levels

Felix Blyakher (2):
xfs: fix double unlock in xfs_swap_extents()
xfs: prevent deadlock in xfs_qm_shake()

Finn Thain (2):
mac8390: fix regression caused during net_device_ops conversion
mac8390: fix build with NET_POLL_CONTROLLER

Florian Fainelli (1):
MAINTAINERS: take maintainership of the cpmac Ethernet driver

Florian Westphal (1):
pktgen: do not access flows[] beyond its length

Forrest Zhang (1):
ath5k: fix exp off-by-one when computing OFDM delta slope

Frans Pop (1):
ACPI processor: remove spurious newline from warning message

Greg Kroah-Hartman (1):
ath1e: add new device id for asus hardware

H. Peter Anvin (1):
x86, setup: revert ACPI 3 E820 extended attributes support

Haavard Skinnemoen (1):
USB: atmel_usb_udc: Use kzalloc() to allocate ep structures

Harry Ciao (2):
edac: AMD8111 & AMD8131 use dev_name()
edac: AMD8111 & AMD8131 Kconfig fixup

Henrik Rydberg (2):
Input: multitouch - add tracking ID to the protocol
Input: multitouch - augment event semantics documentation

Herbert Xu (1):
crypto: hash - Fix handling of sg entry that crosses page boundary

Herton Ronaldo Krzesinski (1):
tomoyo: add missing call to cap_bprm_set_creds

Hideo Saito (1):
powerpc/mm: Fix broken MMU PID stealing on !SMP

Inaky Perez-Gonzalez (1):
wimax/i2400m: usb: fix device reset on autosuspend while not yet idle

Ira Snyder (4):
fsldma: fix "DMA halt timeout!" errors
fsldma: fix infinite loop on multi-descriptor DMA chain completion
fsldma: snooping is not enabled for last entry in descriptor chain
fsldma: fix memory leak on error path in fsl_dma_prep_memcpy()

J. Bruce Fields (1):
nfsd: Revert "svcrpc: take advantage of tcp autotuning"

James Bottomley (1):
async: make sure independent async domains can't accidentally entangle

Jarod Wilson (1):
[CPUFREQ] add atom family to p4-clockmod

Jaswinder Singh Rajput (3):
drm/i915: acpi/video.c fix section mismatch warning
headers_check fix: linux/auto_fs.h
headers_check fix: linux/net_dropmon.h

Jay Sternberg (1):
iwlwifi: update 5000 ucode support to version 2 of API

Jean-Mickael Guerin (1):
IPv6: set RTPROT_KERNEL to initial route

Jesper Dangaard Brouer (1):
netfilter: xt_hashlimit does a wrong SEQ_SKIP

Jesse Barnes (2):
drm/i915: allocate large pointer arrays with vmalloc
i915: support 8xx desktop cursors

Joakim Tjernlund (1):
jffs2: Fix corruption when flash erase/write failure

Joe Perches (2):
MAINTAINERS: pair EDAC-E752X P: and M: entries
acpi-cpufreq: fix printk typo and indentation

Johannes Berg (1):
wext: verify buffer size for SIOCSIWENCODEEXT

John W. Linville (3):
airo: fix airo_get_encode{,ext} buffer overflow like I mean it...
at76c50x-usb: avoid mutex deadlock in at76_dwork_hw_scan
rtl8187: add USB ID for Linksys WUSB54GC-EU v2 USB wifi dongle

Jonas Bonn (1):
drm/i915: Determine type before initialising connector

Jozsef Kadlecsik (1):
netfilter: nf_ct_tcp: fix accepting invalid RST segments

KOSAKI Motohiro (1):
procfs: make errno values consistent when open pident vs exit(2) race occurs

Kay Sievers (1):
Driver Core: do not oops when driver_unregister() is called for unregistered drivers

Kenji Kaneshige (1):
PCI/ACPI: fix wrong ref count handling in acpi_pci_bind()

Kristian Høgsberg (1):
i915: Set object to gtt domain when faulting it back in

Kumar Gala (1):
fsldma: Fix compile warnings

Len Brown (3):
i7300_idle: allow testing on i5000-series hardware w/o re-compile
ACPI: sanity check _PSS frequency to prevent cpufreq crash
ACPI, i915: build fix (v2)

Lennert Buytenhek (1):
gianfar: fix BUG under load after introduction of skb recycling

Li Yang (1):
fsldma: update mailling list address in MAINTAINERS

Linus Torvalds (1):
Linux 2.6.30-rc8

Luis R. Rodriguez (1):
cfg80211: fix race between core hint and driver's custom apply

Ma Ling (4):
drm/i915: Fetch SDVO LVDS mode lines from VBT, then reserve them
drm/i915: Return SDVO LVDS VBT mode if no EDID modes are detected.
drm/i915: Use an I2C algo to do the flip to SDVO DDC bus.
drm/i915: Add support for VGA load detection (pre-945).

Maciej W. Rozycki (1):
3c509: Add missing EISA IDs

Manuel Traut (1):
Input: usb1400_ts - fix access to "device data" in resume function

Marek Szyprowski (1):
S3C-fb: PM fix

Marek Vasut (1):
[ARM] pxa/palm: fix PalmLD/T5/TX AC97 MFP

Martin Fuzzey (1):
USB: atmel-usba-udc : fix control out requests.

Martin Michlmayr (1):
[ARM] Orion: Remove explicit name for platform device resources

Mathieu Desnoyers (4):
[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call
[CPUFREQ] fix timer teardown in conservative governor
[CPUFREQ] fix timer teardown in ondemand governor
[ARM] Add cmpxchg support for ARMv6+ systems (v5)

Matt Kraai (1):
net/firmare: Ignore .cis files

Mel Gorman (2):
x86: ignore VM_LOCKED when determining if hugetlb-backed page tables can be shared or not
mm: account for MAP_SHARED mappings using VM_MAYSHARE and not VM_SHARED in hugetlbfs

Mike Frysinger (9):
bfin_mac: fix build error due to net_device_ops convert
Blackfin: hook up preadv/pwritev syscalls
MAINTAINERS: update Blackfin items
MAINTAINERS: drop (subscribers-only) markings on Blackfin lists
Blackfin: ignore generated vmlinux.lds
Blackfin: drop unneeded asm/.gitignore
Blackfin: fix strncmp.o build error
Revert "USB: Correct Makefile to make isp1760 buildable"
hwmon: (lm78) Add missing __devexit_p()

Mingwei Wang (1):
[ARM] pxa: fix the incorrectly defined drive strength macros for pxa{168,910}

Minoru Usui (1):
net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup

Neil Horman (1):
e1000: add missing length check to e1000 receive routine

NeilBrown (8):
md: always update level / chunk_size / layout when writing v1.x metadata.
md: improve errno return when setting array_size
md: bitmap: improve bitmap maintenance code.
md: export 'frozen' resync state through sysfs
md: raid5: avoid sector values going negative when testing reshape progress.
md: don't update curr_resync_completed without also updating reshape_position.
md: don't use locked_ioctl.
md: raid5: change incorrect usage of 'min' macro to 'min_t'

Nicolas Ferre (1):
atmel_lcdfb: correct fifo size for some products

Nicolas Pitre (1):
[ARM] add coherent DMA mask for mv643xx_eth

Nikanth Karthikesan (1):
memcg: fix build warning and avoid checking for mem != null again and again

Oskar Schirmer (1):
flat: fix data sections alignment

Ozan Çağlayan (1):
ALSA: hda - Add forced codec-slots for ASUS W5Fm

Pablo Neira Ayuso (2):
netfilter: nf_ct_dccp: add missing DCCP protocol changes in event cache
netfilter: nfnetlink_log: fix wrong skbuff size calculation

Pallipadi, Venkatesh (1):
x86: avoid back to back on_each_cpu in cpa_flush_array

Paul Menage (1):
cls_cgroup: read classid atomically in classifier

Paulius Zaleckas (2):
MAINTAINER: Add F: entries for Gemini and FA526
Gemini: Fix SRAM/ROM location after memory swap

Pavel Roskin (1):
ath5k: fix scanning in AR2424

Rafael J. Wysocki (1):
PM: Do not hold dpm_list_mtx while disabling/enabling nonboot CPUs

Reinette Chatre (1):
iwlwifi: do not cancel delayed work inside spin_lock_irqsave

Robert Olsson (1):
ipv4: Fix oops with FIB_TRIE

Robert Richter (1):
oprofile: fix cpu buffer size

Roel Kluin (4):
wireless: beyond ARRAY_SIZE of intf->crypto_stats
gigaset: beyond ARRAY_SIZE of iwb->data
fsldma: fix check on potential fdev->chan[] overflow
drivers/serial/mpc52xx_uart.c: fix array overindexing check

Russell King (3):
[ARM] disable NX support for OABI-supporting kernels
[ARM] barriers: improve xchg, bitops and atomic SMP barriers
[ARM] update mach-types

Rusty Russell (1):
lguest: fix on Intel when KVM loaded (unhandled trap 13)

Ryusuke Konishi (1):
nilfs2: fix bh leak in nilfs_cpfile_delete_checkpoints function

Sam Ravnborg (1):
nfs: fix build error in nfsroot with initconst

Shaohua Li (2):
cpuidle: makes AMD C1E work in acpi_idle
cpuidle: fix AMD C1E suspend hang

Steve Wise (1):
svcrdma: dma unmap the correct length for the RPCRDMA header page.

Suresh Siddha (1):
x86: introduce noxsave boot parameter

Takashi Iwai (4):
ALSA: hda - Add 5stack-no-fp model for STAC927x
ALSA: hda - Add missing check of pin vref 50 and others in Realtek codecs
ALSA: Fix invalid jiffies check after pause
ALSA: Enable PCM hw_ptr_jiffies check only in xrun_debug mode

Tejun Heo (2):
x86: Remove remap percpu allocator for the time being
x86, relocs: ignore R_386_NONE in kernel relocation entries

Tetsuo Handa (1):
kmod: Release sub_info on cred allocation failure.

Thomas Dahlmann (1):
MAINTAINERS: change email address for Thomas Dahlmann

Thomas Reitmayr (1):
[ARM] Kirkwood: Correct MPP for SATA activity/presence LEDs of QNAP TS-119/TS-219.

Thomas Renninger (1):
[CPUFREQ] powernow-k8 cleanup msg if BIOS does not export ACPI _PSS cpufreq data

Timothy Clacy (1):
[ARM] pxa: enable GPIO receivers after configuring pins

Tony Vroon (1):
ALSA: hda - Compaq Presario CQ60 patching for Conexant

Trond Myklebust (1):
NFSv4: Fix the case where NFSv4 renewal fails

Vladimir Barinov (1):
mtd: MXC NAND driver fixes (v5)

Vu Pham (1):
XPRTRDMA: fix client rpcrdma FRMR registration on mlx4 devices

Warren Free (1):
USB: isp1760: urb_dequeue doesn't always find the urbs

Wei Yongjun (1):
nfsd: fix hung up of nfs client while sync write data to nfs server

Xiao Kaijian (1):
USB: Yet another Conexant Clone to add to cdc-acm.c

Xiaotian Feng (1):
gianfar: fix babbling rx error event bug

Yevgeny Petrilin (1):
mlx4_en: Fix a kernel panic when waking tx queue

Zhang Rui (3):
x86: DMI match for the Sony VGN-Z540N as it needs BIOS reboot
ACPI: video: DMI workaround broken eMachines E510 BIOS enabling display brightness
ACPI: video: DMI workaround broken Acer 5315 BIOS enabling display brightness

[email protected] (2):
x86: bugfix wbinvd() model check instead of family check
x86: cpa_flush_array wbinvd should be done on all CPUs


2009-06-04 13:56:19

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Tue June 2 2009, Linus Torvalds wrote:
>
> This is almost certainly the last -rc, and I debated even doing it. But
> rather than just do 2.6.30, I decided that I'd be better off doing a last
> -rc8 and then a real release probably this weekend.
>
> This has mostly driver and arch updates, with perhaps the intel drm/kms
> and network driver changes standing out, but there's powerpc, blackfin and
> arm updates too. Admittedly, the two biggest parts of the powerpc update
> are a revert and a defconfig update.
>
> A lot of small stuff, fixing a few regressions (and at least one bugzilla
> entry going back to 2.6.24). The small stuff does matter. Please test.
>
> Linus
>

Today's first 2.6.30-rc8 build for VIA NetBooks is posted.
Details at:
http://forum.netbookuser.com/viewtopic.php?pid=7034#p7034

- - summary - -

This is the kernel.org repository tag: 2.6.30-rc8 codebase ***without*** local patches;
Using the same configuration as the record holding (-09149), variable speed kernel.

The -09149 announcement is at:
http://forum.netbookuser.com/viewtopic.php?pid=7002#p7002
(Note: that build had local patches.)

I have also re-posted the record holding (-09143lk), fixed speed build, see above links.

2009-06-04 14:59:09

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Michael S. Zick wrote:
> On Tue June 2 2009, Linus Torvalds wrote:
> >
> > This is almost certainly the last -rc, and I debated even doing it. But
> > rather than just do 2.6.30, I decided that I'd be better off doing a last
> > -rc8 and then a real release probably this weekend.
> >
> > This has mostly driver and arch updates, with perhaps the intel drm/kms
> > and network driver changes standing out, but there's powerpc, blackfin and
> > arm updates too. Admittedly, the two biggest parts of the powerpc update
> > are a revert and a defconfig update.
> >
> > A lot of small stuff, fixing a few regressions (and at least one bugzilla
> > entry going back to 2.6.24). The small stuff does matter. Please test.
> >
> > Linus
> >
>
> Today's first 2.6.30-rc8 build for VIA NetBooks is posted.
> Details at:
> http://forum.netbookuser.com/viewtopic.php?pid=7034#p7034
>
> - - summary - -
>
> This is the kernel.org repository tag: 2.6.30-rc8 codebase ***without*** local patches;
> Using the same configuration as the record holding (-09149), variable speed kernel.
>

Well, that one isn't anything to brag about"

top - 09:45:44 up 1:44, 2 users, load average: 1.46, 1.48, 1.47
Tasks: 116 total, 1 running, 115 sleeping, 0 stopped, 0 zombie
Cpu(s): 30.3%us, 8.3%sy, 0.0%ni, 60.6%id, 0.0%wa, 0.9%hi, 0.0%si, 0.0%st
Mem: 446720k total, 419160k used, 27560k free, 26352k buffers
Swap: 2199324k total, 0k used, 2199324k free, 214908k cached
- - - -
3208 mszick 20 0 156m 8752 7424 S 19.4 2.0 16:21.69 pulseaudio
3392 mszick 20 0 311m 44m 23m S 10.2 10.1 11:38.47 vlc
3937 root 20 0 2448 1188 920 R 3.7 0.3 3:18.88 top
49 root 15 -5 0 0 0 S 2.8 0.0 2:55.46 hd-audio0
3009 root 20 0 222m 24m 7372 S 0.9 5.5 1:14.42 Xorg
3798 root 20 0 8500 3048 2456 S 0.9 0.7 1:08.91 sshd
1 root 20 0 3084 1888 564 S 0.0 0.4 0:01.58 init
2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd

Testing the -09155 now (-09154 + local patches).

Mike
> The -09149 announcement is at:
> http://forum.netbookuser.com/viewtopic.php?pid=7002#p7002
> (Note: that build had local patches.)
>
> I have also re-posted the record holding (-09143lk), fixed speed build, see above links.

> mike

2009-06-04 15:19:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]



On Thu, 4 Jun 2009, Michael S. Zick wrote:
>
> Well, that one isn't anything to brag about"

So what's the problem on that machine? Is the VIA C7-M (or some random
device in it) just buggy and it eventually hangs, or what? Has _any_
kernel ever worked on that machine for longer than an hour or two?

Linus

2009-06-04 15:41:28

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, you wrote:
>
> On Thu, 4 Jun 2009, Michael S. Zick wrote:
> >
> > Well, that one isn't anything to brag about"
>
> So what's the problem on that machine? Is the VIA C7-M (or some random
> device in it) just buggy and it eventually hangs, or what? Has _any_
> kernel ever worked on that machine for longer than an hour or two?
>

It is not (directly) the C7-M - it might be the chipset CX700.

I have a small pool (5 or so actively reporting) testers -

On the Everex Cloudbook (C7-M/CX700) I have gotten a record
of 4h45 minutes of up-time on some, locally patched, builds.
The same is reported on other Everex Cloudbooks.

On the Sylvania gBook (also C7-M/CX700) the user reports
10s of minutes as their record up-time.

H.W. got hold of one of the companies demo-boards but I
haven't heard back from him yet on his experiences.

On the HP-2133 (C7-M/CN896) the uptime is unknown - -
only greater than 12 hours (it has never/ever deadlocked).

I also have a couple of technical questions pending
with H.W. for the silicon grower's department.
No response yet.

- - - -

But you asked for "testing" - hence this post.

Mike

> Linus
>
>

2009-06-04 16:00:22

by Duane Griffin

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

2009/6/4 Linus Torvalds <[email protected]>:
> On Thu, 4 Jun 2009, Michael S. Zick wrote:
>> Well, that one isn't anything to brag about"
>
> So what's the problem on that machine? Is the VIA C7-M (or some random
> device in it) just buggy and it eventually hangs, or what? Has _any_
> kernel ever worked on that machine for longer than an hour or two?

There have been reports of hangs on various VIA C7 machines going back
a year now. The version of the kernel doesn't seem to matter, but the
version of glibc does. Unfortunately there hasn't been much progress
in getting to the bottom of it.

See here (and other linked reports):
http://bugs.gentoo.org/show_bug.cgi?id=228263

Cheers,
Duane.

--
"I never could learn to drink that blood and call it wine" - Bob Dylan

2009-06-04 16:13:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]



On Thu, 4 Jun 2009, Duane Griffin wrote:
>
> There have been reports of hangs on various VIA C7 machines going back
> a year now. The version of the kernel doesn't seem to matter, but the
> version of glibc does. Unfortunately there hasn't been much progress
> in getting to the bottom of it.
>
> See here (and other linked reports):
> http://bugs.gentoo.org/show_bug.cgi?id=228263

Hmm. That looks like a CPU problem, but hey, it might be that the glibc
version thing is just coincidence, and just changes timings or whatever,
and the problem is in the chipsets.

So at least from that particular report it smells very much
non-kernel-related.

That said, even if it isn't kernel-related, it might be fixable with some
kernel patch that changes the setup of the CPU/chipset. But we'd need VIA
to help with anythign like that.

Linus

2009-06-04 16:17:11

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Duane Griffin wrote:
> 2009/6/4 Linus Torvalds <[email protected]>:
> > On Thu, 4 Jun 2009, Michael S. Zick wrote:
> >> Well, that one isn't anything to brag about"
> >
> > So what's the problem on that machine? Is the VIA C7-M (or some random
> > device in it) just buggy and it eventually hangs, or what? Has _any_
> > kernel ever worked on that machine for longer than an hour or two?
>
> There have been reports of hangs on various VIA C7 machines going back
> a year now. The version of the kernel doesn't seem to matter, but the
> version of glibc does. Unfortunately there hasn't been much progress
> in getting to the bottom of it.
>

There have been recent tweaks to the futex service in the
kernel, this might interact with how a particular glibc
expects them to work.

I also add a few tweaks in my local patches to futex.etc,
so it may need more in mainline (un-proven at this point).

It seems to make a difference with pulse-audio - -
which is my "test program".

Mike
> See here (and other linked reports):
> http://bugs.gentoo.org/show_bug.cgi?id=228263
>
> Cheers,
> Duane.
>

2009-06-04 16:32:35

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Linus Torvalds wrote:
>
> On Thu, 4 Jun 2009, Duane Griffin wrote:
> >
> > There have been reports of hangs on various VIA C7 machines going back
> > a year now. The version of the kernel doesn't seem to matter, but the
> > version of glibc does. Unfortunately there hasn't been much progress
> > in getting to the bottom of it.
> >
> > See here (and other linked reports):
> > http://bugs.gentoo.org/show_bug.cgi?id=228263
>
> Hmm. That looks like a CPU problem, but hey, it might be that the glibc
> version thing is just coincidence, and just changes timings or whatever,
> and the problem is in the chipsets.
>
> So at least from that particular report it smells very much
> non-kernel-related.
>
> That said, even if it isn't kernel-related, it might be fixable with some
> kernel patch that changes the setup of the CPU/chipset. But we'd need VIA
> to help with anythign like that.
>

That is one of my pending questions - -
(It is included as a comment at the appropriate point in my patchset.)

The VIA processors have MCR's not MTRR's - -
The C7-M processor uses "in-order retirement" not "out-of-order" - -
I think the MCR's **should not** be set for "weak ordered writes" -

_But_ until I hear the recommendation of the Silicon Growers department...

Mike

> Linus
>
>

2009-06-04 17:03:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]



On Thu, 4 Jun 2009, Michael S. Zick wrote:
>
> The C7-M processor uses "in-order retirement" not "out-of-order" - -
> I think the MCR's **should not** be set for "weak ordered writes" -

In-order retirement does not really imply anything at all about how write
ordering works out. In most CPU parlance, you'd say that you've "retired"
a write instruction when it has completed in the write queue - but it
would not mean anything in particular for memory ordering.

Of course, I don't think the C7 is just in-order retirement, I think it's
pretty much in-order everything. Usually you only specify that
"retirement" part when there is some out-of-order execution in other parts
of the pipeline, but I think the C7 is entirely in-order pipeline,
although I suspect the FPU side is likely somewhat separated.

But still, with a write buffer (and _no_ sane x86 does not have a write
buffer), that doesn't actually mean that the cache and memory accesses are
necessarily entirely in-order.

That said, how all the internal CPU registers are set is all black magic.

Linus

2009-06-04 17:07:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]



Side note: is it more stable if you disable the VIA speedstep thing
(whatever it's called (ok, google tells me it's called "TwinTurbo" and
"Advanced PowerSaver")?

Features like that easily put a huge stress on power regulators etc, if
they result in sudden changes in current draw. Underspecced capacitors
etc can cause CPU "brown-outs", which in turn can easily cause total
failure.

Linus

2009-06-04 17:10:20

by Harald Welte

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

Dear Linus and others,

On Thu, Jun 04, 2009 at 09:13:15AM -0700, Linus Torvalds wrote:

> > There have been reports of hangs on various VIA C7 machines going back
> > a year now. The version of the kernel doesn't seem to matter, but the
> > version of glibc does. Unfortunately there hasn't been much progress
> > in getting to the bottom of it.
> >
> > See here (and other linked reports):
> > http://bugs.gentoo.org/show_bug.cgi?id=228263
>
> Hmm. That looks like a CPU problem, but hey, it might be that the glibc
> version thing is just coincidence, and just changes timings or whatever,
> and the problem is in the chipsets.
>
> So at least from that particular report it smells very much
> non-kernel-related.
>
> That said, even if it isn't kernel-related, it might be fixable with some
> kernel patch that changes the setup of the CPU/chipset. But we'd need VIA
> to help with anythign like that.

So far, inside VIA there is no well-known issue/bug about such hangs / locks at
all.

I have seen a number (probably between 5 or 10) of sporadic reports from a
number of people on a variety of systems. Some from actual commercial vendors
of VIA+Linux based appliances, and some from the wider community of end users.
So far, to the best of my knowledge, none of those isseus has been narrowed
down to a sufficiently easy to reproduce test case. Also, none of the bug
reporters has so far been able to reproduce the problem on a genuine VIA
mainboard, i.e. it could be issues introduced by the actual board hardware or
how the speicfic BIOS initializes the low-level hardware.

Especially when SMI/SMM based debugging no longer works (i.e. something that
appears to be a bus lockup), the actual bug needs to be reproduced on a
reference board that can be hooked up to a logic/protocol analyzer.

On the other hand, VIA's CPU division (CentaurLabs) is performing extensive
testing on their CPUs with a large codebase of x86 code, AFAIK based on more
than 40 operating systems. Also, there are large quantities of VIA CPU+chipset
systems that run without any problem, especially in 24/7 embedded x86 worloads
on Linux...

I'm more than determined to help resolving those sporadic Linux lock-up
problems. It feels like there is some problem out there, given the fact that
there is a number of independent reporters who talk about some kind of hard
system hang without oops that even prevents the NMI watchdog to kick in.

However, unless we can somehow narrow down at least one of those reports into
something that is easier to reproduce, and which can actuall be reproduced on
a VIA board. Triggering in 1-4 hours is already very good, I have reports
where 1 of 30 system exposes a lock once within 5 days of continuous full
application workload.

Sure, third party BIOS/board vendors selling products that randomly produce
locks are obviously also not a particularly great advertisement for VIA...
but debigging on such a board is much more difficult due to the lack of access
to BIOS sources, schematics and hardware debugging interfaces.

In any case, if somebody can ship me a system that exposes one of those
lock-ups, together with a pre-installed test case that exposes the problem
within let's say less than one day, plus the full kernel sources used in
that particular system: I'm happy to spend time to investigate the issue,
try to run the same test case on a VIA board, etc.

Any additional help is much appreciated.

Regards,
--
- Harald Welte <[email protected]> http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

2009-06-04 17:19:26

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Linus Torvalds wrote:
>
> That said, how all the internal CPU registers are set is all black magic.
>

That is the situation as I find it -

I have been practicing Computer Black Magic (CBM) not Computer Science (CS)
these past few months. <<deleted>> comment on un-documented processors.

Mike
> Linus
>
>

2009-06-04 17:22:17

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Harald Welte wrote:
> Dear Linus and others,
>
> On Thu, Jun 04, 2009 at 09:13:15AM -0700, Linus Torvalds wrote:
>
> > > There have been reports of hangs on various VIA C7 machines going back
> > > a year now. The version of the kernel doesn't seem to matter, but the
> > > version of glibc does. Unfortunately there hasn't been much progress
> > > in getting to the bottom of it.
> > >
> > > See here (and other linked reports):
> > > http://bugs.gentoo.org/show_bug.cgi?id=228263
> >
> > Hmm. That looks like a CPU problem, but hey, it might be that the glibc
> > version thing is just coincidence, and just changes timings or whatever,
> > and the problem is in the chipsets.
> >
> > So at least from that particular report it smells very much
> > non-kernel-related.
> >
> > That said, even if it isn't kernel-related, it might be fixable with some
> > kernel patch that changes the setup of the CPU/chipset. But we'd need VIA
> > to help with anythign like that.
>
> So far, inside VIA there is no well-known issue/bug about such hangs / locks at
> all.
>
> I have seen a number (probably between 5 or 10) of sporadic reports from a
> number of people on a variety of systems. Some from actual commercial vendors
> of VIA+Linux based appliances, and some from the wider community of end users.
> So far, to the best of my knowledge, none of those isseus has been narrowed
> down to a sufficiently easy to reproduce test case. Also, none of the bug
> reporters has so far been able to reproduce the problem on a genuine VIA
> mainboard, i.e. it could be issues introduced by the actual board hardware or
> how the speicfic BIOS initializes the low-level hardware.
>
> Especially when SMI/SMM based debugging no longer works (i.e. something that
> appears to be a bus lockup), the actual bug needs to be reproduced on a
> reference board that can be hooked up to a logic/protocol analyzer.
>
> On the other hand, VIA's CPU division (CentaurLabs) is performing extensive
> testing on their CPUs with a large codebase of x86 code, AFAIK based on more
> than 40 operating systems. Also, there are large quantities of VIA CPU+chipset
> systems that run without any problem, especially in 24/7 embedded x86 worloads
> on Linux...
>
> I'm more than determined to help resolving those sporadic Linux lock-up
> problems. It feels like there is some problem out there, given the fact that
> there is a number of independent reporters who talk about some kind of hard
> system hang without oops that even prevents the NMI watchdog to kick in.
>
> However, unless we can somehow narrow down at least one of those reports into
> something that is easier to reproduce, and which can actuall be reproduced on
> a VIA board. Triggering in 1-4 hours is already very good, I have reports
> where 1 of 30 system exposes a lock once within 5 days of continuous full
> application workload.
>
> Sure, third party BIOS/board vendors selling products that randomly produce
> locks are obviously also not a particularly great advertisement for VIA...
> but debigging on such a board is much more difficult due to the lack of access
> to BIOS sources, schematics and hardware debugging interfaces.
>
> In any case, if somebody can ship me a system that exposes one of those
> lock-ups, together with a pre-installed test case that exposes the problem
> within let's say less than one day, plus the full kernel sources used in
> that particular system: I'm happy to spend time to investigate the issue,
> try to run the same test case on a VIA board, etc.
>

I am about at my wits end with this Everex product -

Give me a couple more weeks at the problem and if I haven't solved it;
I'll give you this machine if you promise to update LKML with any fix.

Mike
> Any additional help is much appreciated.
>
> Regards,

2009-06-04 17:28:42

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Linus Torvalds wrote:
>
> Side note: is it more stable if you disable the VIA speedstep thing
> (whatever it's called (ok, google tells me it's called "TwinTurbo" and
> "Advanced PowerSaver")?
>

The e_powersave code.
Yes, I build test cases with and without - -
It was a fixed-speed kernel build that first hit the 4 hour up-time mark.
I just reposted that build today (the -09143lk).


> Features like that easily put a huge stress on power regulators etc, if
> they result in sudden changes in current draw. Underspecced capacitors
> etc can cause CPU "brown-outs", which in turn can easily cause total
> failure.
>

There is also a possible thermal issue with these machines - -
I doubt that VIA runs their qualification testing in bake ovens;
which is what NetBook cases amount too. ;)

Before the week's end - I am going to run a few of these test builds
with the machine in the freezer - a 20C decrease in ambient should
show if there is any thermal relationship with the problem.

I just need to run my lan into the freezer compartment - one of the
places in this house without a lan connection. ;)

For want of a documented JTAG port...

Mike
>
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-06-04 17:46:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]



On Thu, 4 Jun 2009, Michael S. Zick wrote:
>
> Yes, I build test cases with and without - -
> It was a fixed-speed kernel build that first hit the 4 hour up-time mark.
> I just reposted that build today (the -09143lk).
>
> > Features like that easily put a huge stress on power regulators etc, if
> > they result in sudden changes in current draw. Underspecced capacitors
> > etc can cause CPU "brown-outs", which in turn can easily cause total
> > failure.
>
> There is also a possible thermal issue with these machines - -
> I doubt that VIA runs their qualification testing in bake ovens;
> which is what NetBook cases amount too. ;)

If the fixed-speed case runs for longer, it's not likely to be a thermal
issue. The fixed speed case should be the higher-power one.

So it can easily be a weak power setup (insufficient grounding, bad
capacitors etc). But it could also be external bus issues, in case VIA
power management also impact the external bus (eg "stopclock" like
behavior on the CPU<->chipset bus).

One thing you could try is to avoid using the "halt" instruction. It will
obviously increase power use (and thus higher temperatures), but again,
current fluctuations are much more likely to cause problems than higher,
but fairly constant, power draw.

Think about all the light-bulbs you've seen that burn out just when you
turn them on.

Use "idle=poll" on the kernel command line to avoid the idle loop using
the "halt" or "mwait" instructions to save power.

(That polling idle loop can also end up hiding cache coherency issues with
DMA, so if that works better, it doesn't necessarily prove it's
power-related. Shutting down the CPU core can have interesting
implications for external events, and you can have various races - maybe
you shut down the core just as a chipset event happened, and the chipset
_thinks_ the core is now awake, but the core went to sleep. End result:
hung machine).

Linus

2009-06-04 17:50:17

by Harald Welte

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu, Jun 04, 2009 at 11:21:28AM -0500, Michael S. Zick wrote:

> That is one of my pending questions - -
> (It is included as a comment at the appropriate point in my patchset.)
>
> The VIA processors have MCR's not MTRR's - -

AFAIK, that was true for processors like the Winhcip / C6, i.e. earlier than
the C3. The C3, C7 and later support 8 intel-style MTRR's.

> The C7-M processor uses "in-order retirement" not "out-of-order" - -
> I think the MCR's **should not** be set for "weak ordered writes" -

why would it matter on UP? as indicated, I'm not the expert here, but I thought
memory ordering issues only arise in SMP systems [or possibly with regard to
DMA, but as we already explored much earlier in this thread, drivers that access
DMA buffers whil the hardware owns them are buggy and need to be fixed]

Regards,
--
- Harald Welte <[email protected]> http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

2009-06-04 17:50:37

by Harald Welte

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

Hi Michael,

On Thu, Jun 04, 2009 at 10:37:46AM -0500, Michael S. Zick wrote:

> On the Everex Cloudbook (C7-M/CX700) I have gotten a record
> of 4h45 minutes of up-time on some, locally patched, builds.
> The same is reported on other Everex Cloudbooks.
>
> On the Sylvania gBook (also C7-M/CX700) the user reports
> 10s of minutes as their record up-time.

It seems like I need to obtain one of those. I'll see if VIA can help, but
it might help if you have a known-bad unit that I could get for debugging.

Just to re-confirm: Those kind of problems are you seeing with any kernel, or
only specifically with later (2.6.30-rcX) kernels? If it's not a regression,
then I think there is no impact of the bug on the 2.6.30 release.

> H.W. got hold of one of the companies demo-boards but I
> haven't heard back from him yet on his experiences.

I don't really have your test case. I've personally been running a CX700 based
mainboard (some EPIA board) as my digital video recording device for > one year
here, without any lock ups at all. I have not been running any recent kernels
on that system, though.

Regards,
--
- Harald Welte <[email protected]> http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

2009-06-04 17:56:21

by Dave Jones

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu, Jun 04, 2009 at 12:27:07PM -0500, Michael S. Zick wrote:

> > Features like that easily put a huge stress on power regulators etc, if
> > they result in sudden changes in current draw. Underspecced capacitors
> > etc can cause CPU "brown-outs", which in turn can easily cause total
> > failure.
>
> There is also a possible thermal issue with these machines - -
> I doubt that VIA runs their qualification testing in bake ovens;
> which is what NetBook cases amount too. ;)

FWIW, they do, I've toured their facility and seen those ovens ;)
I forget just how high they bake them at, but it was something ridiculous
that even netbooks won't come close to.

Dave

2009-06-04 18:13:20

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Harald Welte wrote:
> On Thu, Jun 04, 2009 at 11:21:28AM -0500, Michael S. Zick wrote:
>
> > That is one of my pending questions - -
> > (It is included as a comment at the appropriate point in my patchset.)
> >
> > The VIA processors have MCR's not MTRR's - -
>
> AFAIK, that was true for processors like the Winhcip / C6, i.e. earlier than
> the C3. The C3, C7 and later support 8 intel-style MTRR's.
>

Super! A specific breakage!

The c7 setup code is re-using the c6 setup code (MCR's) - -
Will "if 0" out the appropriate parts and arrange for the MTRR setup.

@Linus - -
The Debian/Ubuntu distribution kernels require irqpoll (2.6.28+) -
I took that out very early in my testing, when that problem got fixed -
I will also try putting that back in, it might be needed for
some of its side-effects on the processor/chipset.

> > The C7-M processor uses "in-order retirement" not "out-of-order" - -
> > I think the MCR's **should not** be set for "weak ordered writes" -
>
> why would it matter on UP? as indicated, I'm not the expert here, but I thought
> memory ordering issues only arise in SMP systems [or possibly with regard to
> DMA, but as we already explored much earlier in this thread, drivers that access
> DMA buffers whil the hardware owns them are buggy and need to be fixed]
>

I just recall the problems with the pa-risc port (not all machines have coherent I/O);
some have consistent I/O only. I have one of those also, lucky me. ;)

Mike
> Regards,

2009-06-04 19:28:24

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Michael S. Zick wrote:
> On Thu June 4 2009, Harald Welte wrote:
> > On Thu, Jun 04, 2009 at 11:21:28AM -0500, Michael S. Zick wrote:
> >
>
> @Linus - -
> The Debian/Ubuntu distribution kernels require irqpoll (2.6.28+) -
> I took that out very early in my testing, when that problem got fixed -
> I will also try putting that back in, it might be needed for
> some of its side-effects on the processor/chipset.
>

It looks like I stopped using irqpoll on 4/10 (-rc1)
http://forum.netbookuser.com/viewtopic.php?pid=6708#p6708

Am re-testing the "virgin" kernel build with both command options now:
http://forum.netbookuser.com/viewtopic.php?pid=7037#p7037

Will post if it has less than 6 months of up-time. ;)

Mike

2009-06-04 20:16:44

by Andi Kleen

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

Harald Welte <[email protected]> writes:
>
> why would it matter on UP? as indicated, I'm not the expert here, but I thought
> memory ordering issues only arise in SMP systems [or possibly with regard to
> DMA, but as we already explored much earlier in this thread, drivers that access
> DMA buffers whil the hardware owns them are buggy and need to be fixed]

Sorry we didn't establish that. Accessing data structures that are
also accessed by DMA hardware is pretty common in fact and memory
ordering issues also come up regularly (e.g. all the infamous PCI
posting bugs)

What we established is that the drivers don't use LOCK for it
(or at least we think that's very unlikely)

-Andi

--
[email protected] -- Speaking for myself only.

2009-06-04 20:33:31

by Dave Jones

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu, Jun 04, 2009 at 01:12:43PM -0500, Michael S. Zick wrote:
> On Thu June 4 2009, Harald Welte wrote:
> > On Thu, Jun 04, 2009 at 11:21:28AM -0500, Michael S. Zick wrote:
> >
> > > That is one of my pending questions - -
> > > (It is included as a comment at the appropriate point in my patchset.)
> > >
> > > The VIA processors have MCR's not MTRR's - -
> >
> > AFAIK, that was true for processors like the Winhcip / C6, i.e. earlier than
> > the C3. The C3, C7 and later support 8 intel-style MTRR's.
> >
>
> Super! A specific breakage!
>
> The c7 setup code is re-using the c6 setup code (MCR's) - -
> Will "if 0" out the appropriate parts and arrange for the MTRR setup.

It's not touching the MCRs. C7's are family 6. The MCR code is only
called for family 5. (Winchips) Look at the switch statement
in init_centaur()

Dave

2009-06-04 20:39:44

by Gerd Hoffmann

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

Hi,

> I'm more than determined to help resolving those sporadic Linux lock-up
> problems. It feels like there is some problem out there, given the fact that
> there is a number of independent reporters who talk about some kind of hard
> system hang without oops that even prevents the NMI watchdog to kick in.

Data point: I have trouble with an (oldish) via box too.

processor : 0
vendor_id : CentaurHauls
cpu family : 6
model : 9
model name : VIA Nehemiah
stepping : 3

It has a quite interesting error pattern: It runs rock solid in summer,
it has stability issues in winter. It is located in a server rack in a
basement, so the weather out there shouldn't make a big difference to
the machine. I'm really curious what this might be ...

"stability issues" means it does either lock up or reboot, usually
without anything on the serial console. Seems to be more likely under
load. The periods of time the machine runs can range from hours to weeks.

cheers,
Gerd

2009-06-04 20:57:41

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Andi Kleen wrote:
> Harald Welte <[email protected]> writes:
> >
> > why would it matter on UP? as indicated, I'm not the expert here, but I thought
> > memory ordering issues only arise in SMP systems [or possibly with regard to
> > DMA, but as we already explored much earlier in this thread, drivers that access
> > DMA buffers whil the hardware owns them are buggy and need to be fixed]
>
> Sorry we didn't establish that. Accessing data structures that are
> also accessed by DMA hardware is pretty common in fact and memory
> ordering issues also come up regularly (e.g. all the infamous PCI
> posting bugs)
>
> What we established is that the drivers don't use LOCK for it
> (or at least we think that's very unlikely)
>

It was a real headache in the pa-risc port - -
Even went so far as to build some experimental kernels where all
the spin-lock structures where in a separate loader section.

That was to avoid in-direct interference - I.E: Both DMA and
the processor handling the locking **both** invalidating the
same cache line at the same time (only one can win).

Things might get that deep with this processor/chip-set combination;
but pa-risc has some very unusual hardware in some older models.

My favorite still is a human coding error somewhere, not an
architectural or structural problem.

Mike
> -Andi
>

2009-06-04 21:02:39

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Michael S. Zick wrote:
> On Thu June 4 2009, Michael S. Zick wrote:
> > On Thu June 4 2009, Harald Welte wrote:
> > > On Thu, Jun 04, 2009 at 11:21:28AM -0500, Michael S. Zick wrote:
> > >
> >
> > @Linus - -
> > The Debian/Ubuntu distribution kernels require irqpoll (2.6.28+) -
> > I took that out very early in my testing, when that problem got fixed -
> > I will also try putting that back in, it might be needed for
> > some of its side-effects on the processor/chipset.
> >
>
> It looks like I stopped using irqpoll on 4/10 (-rc1)
> http://forum.netbookuser.com/viewtopic.php?pid=6708#p6708
>
> Am re-testing the "virgin" kernel build with both command options now:
> http://forum.netbookuser.com/viewtopic.php?pid=7037#p7037
>
> Will post if it has less than 6 months of up-time. ;)
>

About 6 months less than 6 months up-time ;)

2.6.30-rc8 - no special cmd line options: 1h45m
2.6.30-rc8 - idle=poll irqpoll options: 1h08m

Summary:
Phooey!

Mike

> Mike
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-06-04 21:10:47

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Dave Jones wrote:
> On Thu, Jun 04, 2009 at 01:12:43PM -0500, Michael S. Zick wrote:
> > On Thu June 4 2009, Harald Welte wrote:
> > > On Thu, Jun 04, 2009 at 11:21:28AM -0500, Michael S. Zick wrote:
> > >
> > > > That is one of my pending questions - -
> > > > (It is included as a comment at the appropriate point in my patchset.)
> > > >
> > > > The VIA processors have MCR's not MTRR's - -
> > >
> > > AFAIK, that was true for processors like the Winhcip / C6, i.e. earlier than
> > > the C3. The C3, C7 and later support 8 intel-style MTRR's.
> > >
> >
> > Super! A specific breakage!
> >
> > The c7 setup code is re-using the c6 setup code (MCR's) - -
> > Will "if 0" out the appropriate parts and arrange for the MTRR setup.
>
> It's not touching the MCRs. C7's are family 6. The MCR code is only
> called for family 5. (Winchips) Look at the switch statement
> in init_centaur()
>

Yup, mis-read that one - -

case 6:
init_c3(c);
break;

= = = =

But while your here, what is your opinion on this one,
in: int __init pcibios_init(void)

- - - - -
pci_cache_line_size = 32 >> 2;
if (c->x86 >= 6
&& (c->x86_vendor == X86_VENDOR_AMD) || (c->x86_vendor == X86_VENDOR_CENTAUR))
pci_cache_line_size = 64 >> 2; /* K7 & K8 and VIA C7-M */
else if (c->x86 > 6 && c->x86_vendor == X86_VENDOR_INTEL)
pci_cache_line_size = 128 >> 2; /* P4 */

Mike

2009-06-04 21:25:39

by Dave Jones

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu, Jun 04, 2009 at 04:01:30PM -0500, Michael S. Zick wrote:

> But while your here, what is your opinion on this one,
> in: int __init pcibios_init(void)
>
> - - - - -
> pci_cache_line_size = 32 >> 2;
> if (c->x86 >= 6
> && (c->x86_vendor == X86_VENDOR_AMD) || (c->x86_vendor == X86_VENDOR_CENTAUR))
> pci_cache_line_size = 64 >> 2; /* K7 & K8 and VIA C7-M */
> else if (c->x86 > 6 && c->x86_vendor == X86_VENDOR_INTEL)
> pci_cache_line_size = 128 >> 2; /* P4 */
>
> Mike

C7's L1 cachelines are 64 bytes, so it's right in that case,
but the earlier Centaur CPUs are 32 bytes, so it should be checking steppings.

Or better yet, why not just set it to boot_cpu_data->x86_clflush_size ?

Dave

2009-06-04 21:38:23

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Dave Jones wrote:
> On Thu, Jun 04, 2009 at 04:01:30PM -0500, Michael S. Zick wrote:
>
> > But while your here, what is your opinion on this one,
> > in: int __init pcibios_init(void)
> >
> > - - - - -
> > pci_cache_line_size = 32 >> 2;
> > if (c->x86 >= 6
> > && (c->x86_vendor == X86_VENDOR_AMD) || (c->x86_vendor == X86_VENDOR_CENTAUR))
> > pci_cache_line_size = 64 >> 2; /* K7 & K8 and VIA C7-M */
> > else if (c->x86 > 6 && c->x86_vendor == X86_VENDOR_INTEL)
> > pci_cache_line_size = 128 >> 2; /* P4 */
> >
> > Mike
>
> C7's L1 cachelines are 64 bytes, so it's right in that case,
> but the earlier Centaur CPUs are 32 bytes, so it should be checking steppings.
>
> Or better yet, why not just set it to boot_cpu_data->x86_clflush_size ?
>

I have already preceded that chunk of code with a printk and confirmed
that x86_clflush_size is properly set to 64 bytes (somewhere else).

So your suggestion is the obvious one for the C7-M,
I don't know about any other makes/models.

This machine's C7-M is being reported as a "stepping 0" ??
What is earlier than a stepping 0 ??

Mike

> Dave
>
>
>

2009-06-04 21:43:35

by Dave Jones

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu, Jun 04, 2009 at 04:38:13PM -0500, Michael S. Zick wrote:
> I have already preceded that chunk of code with a printk and confirmed
> that x86_clflush_size is properly set to 64 bytes (somewhere else).
>
> So your suggestion is the obvious one for the C7-M,
> I don't know about any other makes/models.
>
> This machine's C7-M is being reported as a "stepping 0" ??
> What is earlier than a stepping 0 ??

Earlier model numbers.
(Stepping gets reset to 0 every time they bump the model)

Dave

2009-06-04 22:00:28

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Dave Jones wrote:
> On Thu, Jun 04, 2009 at 04:38:13PM -0500, Michael S. Zick wrote:
> > I have already preceded that chunk of code with a printk and confirmed
> > that x86_clflush_size is properly set to 64 bytes (somewhere else).
> >
> > So your suggestion is the obvious one for the C7-M,
> > I don't know about any other makes/models.
> >
> > This machine's C7-M is being reported as a "stepping 0" ??
> > What is earlier than a stepping 0 ??
>
> Earlier model numbers.
> (Stepping gets reset to 0 every time they bump the model)
>

So 6.model=13.stepping=0 should do it for the (or this) C7-M, correct?
Your mention of "earlier models" will not apply until there is a
stepping >0 for a model=13, correct?

Would a good practice be to put a "WARN_ON" in there in case a stepping
greater than =0 happens to execute the code?
In addition to checking specificly for 6.13.0 (rather than just 6.13).

Hmmm...
But that is making a big mess of a small mess -
Why not just do the assignment you suggest for 6.model>=13?

Models earlier than 13 are evidently working (due to lack of
bug reports or other noise on this list).

Mike
> Dave
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-06-04 23:27:50

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Michael S. Zick wrote:
> On Thu June 4 2009, Dave Jones wrote:
> > On Thu, Jun 04, 2009 at 04:38:13PM -0500, Michael S. Zick wrote:
>

I have it now, you where speaking of the "product vendor Centaur"
not a specific model name "Centaur".

Which I translate into: that statement block needs to be converted into
(possibly nested) switch statement(s).
Since there is not a "model check" in it only a "Series" and "Vendor" check.

Yuck.
Mike

> > > I have already preceded that chunk of code with a printk and confirmed
> > > that x86_clflush_size is properly set to 64 bytes (somewhere else).
> > >
> > > So your suggestion is the obvious one for the C7-M,
> > > I don't know about any other makes/models.
> > >
> > > This machine's C7-M is being reported as a "stepping 0" ??
> > > What is earlier than a stepping 0 ??
> >
> > Earlier model numbers.
> > (Stepping gets reset to 0 every time they bump the model)
> >
>
> So 6.model=13.stepping=0 should do it for the (or this) C7-M, correct?
> Your mention of "earlier models" will not apply until there is a
> stepping >0 for a model=13, correct?
>
> Would a good practice be to put a "WARN_ON" in there in case a stepping
> greater than =0 happens to execute the code?
> In addition to checking specificly for 6.13.0 (rather than just 6.13).
>
> Hmmm...
> But that is making a big mess of a small mess -
> Why not just do the assignment you suggest for 6.model>=13?
>
> Models earlier than 13 are evidently working (due to lack of
> bug reports or other noise on this list).
>
> Mike
> > Dave
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> >
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-06-05 00:15:37

by Dave Jones

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu, Jun 04, 2009 at 06:26:40PM -0500, Michael S. Zick wrote:
> On Thu June 4 2009, Michael S. Zick wrote:
> > On Thu June 4 2009, Dave Jones wrote:
> > > On Thu, Jun 04, 2009 at 04:38:13PM -0500, Michael S. Zick wrote:
> >
>
> I have it now, you where speaking of the "product vendor Centaur"
> not a specific model name "Centaur".
>
> Which I translate into: that statement block needs to be converted into
> (possibly nested) switch statement(s).
> Since there is not a "model check" in it only a "Series" and "Vendor" check.
>
> Yuck.

I meant just doing something like this..

(untested)

I'm not sure if the clflush_size==0 case can happen, which is why
I left the fallback. Maybe on ancient cpus?

Dave

diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 2202b62..a293c71 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -426,11 +426,14 @@ int __init pcibios_init(void)
* and P4. It's also good for 386/486s (which actually have 16)
* as quite a few PCI devices do not support smaller values.
*/
- pci_cache_line_size = 32 >> 2;
- if (c->x86 >= 6 && c->x86_vendor == X86_VENDOR_AMD)
- pci_cache_line_size = 64 >> 2; /* K7 & K8 */
- else if (c->x86 > 6 && c->x86_vendor == X86_VENDOR_INTEL)
- pci_cache_line_size = 128 >> 2; /* P4 */
+
+ if (c->x86_clflush_size > 0)
+ pci_cache_line_size = c->x86_clflush_size >> 2;
+ else
+ pci_cache_line_size = 32 >> 2;
+
+ printk(KERN_DEBUG "PCI: pci_cache_line_size set to %d bytes\n",
+ pci_cache_line_size >> 2);

pcibios_resource_survey();

2009-06-05 00:27:29

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Dave Jones wrote:
> On Thu, Jun 04, 2009 at 06:26:40PM -0500, Michael S. Zick wrote:
> > On Thu June 4 2009, Michael S. Zick wrote:
> > > On Thu June 4 2009, Dave Jones wrote:
> > > > On Thu, Jun 04, 2009 at 04:38:13PM -0500, Michael S. Zick wrote:
> > >
> >
> > I have it now, you where speaking of the "product vendor Centaur"
> > not a specific model name "Centaur".
> >
> > Which I translate into: that statement block needs to be converted into
> > (possibly nested) switch statement(s).
> > Since there is not a "model check" in it only a "Series" and "Vendor" check.
> >
> > Yuck.
>
> I meant just doing something like this..
>
> (untested)
>
> I'm not sure if the clflush_size==0 case can happen, which is why
> I left the fallback. Maybe on ancient cpus?
>
> Dave
>
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index 2202b62..a293c71 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -426,11 +426,14 @@ int __init pcibios_init(void)
> * and P4. It's also good for 386/486s (which actually have 16)
> * as quite a few PCI devices do not support smaller values.
> */
> - pci_cache_line_size = 32 >> 2;
> - if (c->x86 >= 6 && c->x86_vendor == X86_VENDOR_AMD)
> - pci_cache_line_size = 64 >> 2; /* K7 & K8 */
> - else if (c->x86 > 6 && c->x86_vendor == X86_VENDOR_INTEL)
> - pci_cache_line_size = 128 >> 2; /* P4 */
> +
> + if (c->x86_clflush_size > 0)
> + pci_cache_line_size = c->x86_clflush_size >> 2;
> + else
> + pci_cache_line_size = 32 >> 2;
> +
> + printk(KERN_DEBUG "PCI: pci_cache_line_size set to %d bytes\n",
> + pci_cache_line_size >> 2);
>
> pcibios_resource_survey();
>

That would clean up the code and deal with the situation.
Sorry I was so slow about following what you meant. Age.

Will put that in tomorrows build instead of my idea.

Mike
>
>

2009-06-05 00:33:18

by Robert Hancock

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

Michael S. Zick wrote:
> On Thu June 4 2009, Andi Kleen wrote:
>> Harald Welte <[email protected]> writes:
>>> why would it matter on UP? as indicated, I'm not the expert here, but I thought
>>> memory ordering issues only arise in SMP systems [or possibly with regard to
>>> DMA, but as we already explored much earlier in this thread, drivers that access
>>> DMA buffers whil the hardware owns them are buggy and need to be fixed]
>> Sorry we didn't establish that. Accessing data structures that are
>> also accessed by DMA hardware is pretty common in fact and memory
>> ordering issues also come up regularly (e.g. all the infamous PCI
>> posting bugs)
>>
>> What we established is that the drivers don't use LOCK for it
>> (or at least we think that's very unlikely)
>>
>
> It was a real headache in the pa-risc port - -
> Even went so far as to build some experimental kernels where all
> the spin-lock structures where in a separate loader section.
>
> That was to avoid in-direct interference - I.E: Both DMA and
> the processor handling the locking **both** invalidating the
> same cache line at the same time (only one can win).
>
> Things might get that deep with this processor/chip-set combination;
> but pa-risc has some very unusual hardware in some older models.

That sort of thing should be architecturally impossible on x86. In order
for something to invalidate the cache line, it first has to own it
(except maybe for some unusual cases like Memory Write and Invalidate
where the writer promises to overwrite the entire cache line).

2009-06-05 00:52:47

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Robert Hancock wrote:
> Michael S. Zick wrote:
> > On Thu June 4 2009, Andi Kleen wrote:
> >> Harald Welte <[email protected]> writes:
> >>> why would it matter on UP? as indicated, I'm not the expert here, but I thought
> >>> memory ordering issues only arise in SMP systems [or possibly with regard to
> >>> DMA, but as we already explored much earlier in this thread, drivers that access
> >>> DMA buffers whil the hardware owns them are buggy and need to be fixed]
> >> Sorry we didn't establish that. Accessing data structures that are
> >> also accessed by DMA hardware is pretty common in fact and memory
> >> ordering issues also come up regularly (e.g. all the infamous PCI
> >> posting bugs)
> >>
> >> What we established is that the drivers don't use LOCK for it
> >> (or at least we think that's very unlikely)
> >>
> >
> > It was a real headache in the pa-risc port - -
> > Even went so far as to build some experimental kernels where all
> > the spin-lock structures where in a separate loader section.
> >
> > That was to avoid in-direct interference - I.E: Both DMA and
> > the processor handling the locking **both** invalidating the
> > same cache line at the same time (only one can win).
> >
> > Things might get that deep with this processor/chip-set combination;
> > but pa-risc has some very unusual hardware in some older models.
>
> That sort of thing should be architecturally impossible on x86. In order
> for something to invalidate the cache line, it first has to own it
> (except maybe for some unusual cases like Memory Write and Invalidate
> where the writer promises to overwrite the entire cache line).
>
>

VIA has not publicly published sufficient technical information to presume
that the cache coherency control protocols are the same as Intel's.

These are cpu/chipset pairs - Think System On 2 Chips. SoS2C.

Mike
Mike

2009-06-05 00:54:59

by Robert Hancock

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

Michael S. Zick wrote:
> VIA has not publicly published sufficient technical information to presume
> that the cache coherency control protocols are the same as Intel's.
>
> These are cpu/chipset pairs - Think System On 2 Chips. SoS2C.

At the low level, maybe not, but functionally it has to be equivalent.
Otherwise the chip can't really be considered x86-compatible.

2009-06-05 04:37:43

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Michael S. Zick wrote:
> On Thu June 4 2009, Michael S. Zick wrote:
> > On Thu June 4 2009, Michael S. Zick wrote:
> > > On Thu June 4 2009, Harald Welte wrote:
> > > > On Thu, Jun 04, 2009 at 11:21:28AM -0500, Michael S. Zick wrote:
> > > >
> > >
> > > @Linus - -
> > > The Debian/Ubuntu distribution kernels require irqpoll (2.6.28+) -
> > > I took that out very early in my testing, when that problem got fixed -
> > > I will also try putting that back in, it might be needed for
> > > some of its side-effects on the processor/chipset.
> > >
> >
> > It looks like I stopped using irqpoll on 4/10 (-rc1)
> > http://forum.netbookuser.com/viewtopic.php?pid=6708#p6708
> >
> > Am re-testing the "virgin" kernel build with both command options now:
> > http://forum.netbookuser.com/viewtopic.php?pid=7037#p7037
> >
> > Will post if it has less than 6 months of up-time. ;)
> >
>
> About 6 months less than 6 months up-time ;)
>
> 2.6.30-rc8 - no special cmd line options: 1h45m
> 2.6.30-rc8 - idle=poll irqpoll options: 1h08m
>

2.6.30-rc8-ce1200v-09155 - uptime: 5 hours+, still running.
Ship it.

Details:
http://forum.netbookuser.com/viewtopic.php?pid=7039#p7039

Mike

> Summary:
> Phooey!
>
> Mike
>
> > Mike
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> >
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-06-05 07:20:17

by Harald Welte

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu, Jun 04, 2009 at 10:07:37AM -0700, Linus Torvalds wrote:
>
>
> Side note: is it more stable if you disable the VIA speedstep thing
> (whatever it's called (ok, google tells me it's called "TwinTurbo" and
> "Advanced PowerSaver")?
>
> Features like that easily put a huge stress on power regulators etc, if
> they result in sudden changes in current draw. Underspecced capacitors
> etc can cause CPU "brown-outs", which in turn can easily cause total
> failure.

I agree, that might be an interesting test.

As a side note: The C7-M actually uses the same software interface as Intel CPU
for doing the power transitions (ACPI Performance States). Only very early C7
models used the cpufreq/e_powersaver.c driver.

So if you want to disable it, make sure you disable X86_ACPI_CPUFREQ.

--
- Harald Welte <[email protected]> http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

2009-06-05 07:27:50

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Fri June 5 2009, Harald Welte wrote:
> On Thu, Jun 04, 2009 at 10:07:37AM -0700, Linus Torvalds wrote:
> >
> >
> > Side note: is it more stable if you disable the VIA speedstep thing
> > (whatever it's called (ok, google tells me it's called "TwinTurbo" and
> > "Advanced PowerSaver")?
> >
> > Features like that easily put a huge stress on power regulators etc, if
> > they result in sudden changes in current draw. Underspecced capacitors
> > etc can cause CPU "brown-outs", which in turn can easily cause total
> > failure.
>
> I agree, that might be an interesting test.
>
> As a side note: The C7-M actually uses the same software interface as Intel CPU
> for doing the power transitions (ACPI Performance States). Only very early C7
> models used the cpufreq/e_powersaver.c driver.
>

What is "very early C7 models" in terms of cpuid results?

If you can get permission to reveal those field details -
we can test for that (this build reports "stepping 0" but
I don't trust the report).

Mike
> So if you want to disable it, make sure you disable X86_ACPI_CPUFREQ.
>

2009-06-05 07:30:24

by Harald Welte

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu, Jun 04, 2009 at 12:27:07PM -0500, Michael S. Zick wrote:
> On Thu June 4 2009, Linus Torvalds wrote:
> >
> > Side note: is it more stable if you disable the VIA speedstep thing
> > (whatever it's called (ok, google tells me it's called "TwinTurbo" and
> > "Advanced PowerSaver")?
> >
>
> The e_powersave code.

that is only for the early C7, as I've been told by Centaur. C7-M should
definitely support the ACPI states. So better to disable both.

> It was a fixed-speed kernel build that first hit the 4 hour up-time mark.
> I just reposted that build today (the -09143lk).

did you actually check if the power transitions are on longer happening?

> For want of a documented JTAG port...

there is JTAG on the CPU, but only for EXTEST/INTEST, i.e. access to the actual
signal lanes for making sure all the BGA pads are soldered and actually connect
to the right signals of the next component on the board. There is no embedded
ICE or anything of that sort. Also, your board most likely doesn't have JTAG
on an accessible connector, since it's out of mass production.

--
- Harald Welte <[email protected]> http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

2009-06-05 07:30:36

by Harald Welte

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu, Jun 04, 2009 at 12:18:42PM -0500, Michael S. Zick wrote:
> > In any case, if somebody can ship me a system that exposes one of those
> > lock-ups, together with a pre-installed test case that exposes the problem
> > within let's say less than one day, plus the full kernel sources used in
> > that particular system: I'm happy to spend time to investigate the issue,
> > try to run the same test case on a VIA board, etc.
>
> I am about at my wits end with this Everex product -
>
> Give me a couple more weeks at the problem and if I haven't solved it;
> I'll give you this machine if you promise to update LKML with any fix.

Well, I cannnot promise that I will actually find the problem and find a fix.
But I can promise to look into the issue and potentially ask other people
at VIA/Centaur to help with investigating it.

And as soon as the problem can be reproduced, or we have a fix, or I give up,
you would get the system back, obviously.

I'll try to figure out how we can do the shipping while billing it to VIA's
account.

In any case, let's wait until I hear from VIA if they can provide one of those
machines to me.

--
- Harald Welte <[email protected]> http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

2009-06-05 07:39:33

by Harald Welte

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu, Jun 04, 2009 at 10:16:35PM +0200, Andi Kleen wrote:
> Harald Welte <[email protected]> writes:
> >
> > why would it matter on UP? as indicated, I'm not the expert here, but I thought
> > memory ordering issues only arise in SMP systems [or possibly with regard to
> > DMA, but as we already explored much earlier in this thread, drivers that access
> > DMA buffers whil the hardware owns them are buggy and need to be fixed]
>
> Sorry we didn't establish that. Accessing data structures that are
> also accessed by DMA hardware is pretty common in fact and memory
> ordering issues also come up regularly (e.g. all the infamous PCI
> posting bugs)
>
> What we established is that the drivers don't use LOCK for it
> (or at least we think that's very unlikely)

Yes, sorry for not mentioning that. I was implicitly thinking of "not using an
appropriate locking method"

--
- Harald Welte <[email protected]> http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

2009-06-05 07:42:47

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Fri June 5 2009, Harald Welte wrote:
> On Thu, Jun 04, 2009 at 12:27:07PM -0500, Michael S. Zick wrote:
> > On Thu June 4 2009, Linus Torvalds wrote:
> > >
> > > Side note: is it more stable if you disable the VIA speedstep thing
> > > (whatever it's called (ok, google tells me it's called "TwinTurbo" and
> > > "Advanced PowerSaver")?
> > >
> >
> > The e_powersave code.
>
> that is only for the early C7, as I've been told by Centaur. C7-M should
> definitely support the ACPI states. So better to disable both.
>
> > It was a fixed-speed kernel build that first hit the 4 hour up-time mark.
> > I just reposted that build today (the -09143lk).
>
> did you actually check if the power transitions are on longer happening?
>

No - without a /sys...cpufreq directory I don't know how to do that.
But the machine is performing exactly as if it was at a fixed speed of 600Mhz.

Much different than when e_powersaver is included - -

Also, the e_powersaver stats look reasonable when compared with test loads
running, so although this may be an acpi controlled system, e_powersaver
"thinks" it has control (and does make a difference in performance).
It is not that hard to tell subjectively the difference between 0.4Ghz
and 1.2Ghz.

Mike
> > For want of a documented JTAG port...
>
> there is JTAG on the CPU, but only for EXTEST/INTEST, i.e. access to the actual
> signal lanes for making sure all the BGA pads are soldered and actually connect
> to the right signals of the next component on the board. There is no embedded
> ICE or anything of that sort. Also, your board most likely doesn't have JTAG
> on an accessible connector, since it's out of mass production.
>

2009-06-05 07:44:59

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Fri June 5 2009, Harald Welte wrote:
> On Thu, Jun 04, 2009 at 12:18:42PM -0500, Michael S. Zick wrote:
> > > In any case, if somebody can ship me a system that exposes one of those
> > > lock-ups, together with a pre-installed test case that exposes the problem
> > > within let's say less than one day, plus the full kernel sources used in
> > > that particular system: I'm happy to spend time to investigate the issue,
> > > try to run the same test case on a VIA board, etc.
> >
> > I am about at my wits end with this Everex product -
> >
> > Give me a couple more weeks at the problem and if I haven't solved it;
> > I'll give you this machine if you promise to update LKML with any fix.
>
> Well, I cannnot promise that I will actually find the problem and find a fix.
> But I can promise to look into the issue and potentially ask other people
> at VIA/Centaur to help with investigating it.
>
> And as soon as the problem can be reproduced, or we have a fix, or I give up,
> you would get the system back, obviously.
>

I would not want it back, obviously.

> I'll try to figure out how we can do the shipping while billing it to VIA's
> account.
>

Collect?

> In any case, let's wait until I hear from VIA if they can provide one of those
> machines to me.
>

Just ask for an Everex CrapBook - they will know what you mean.
The company went out of the NetBook business for a good reason.

Mike

2009-06-05 07:53:16

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Fri June 5 2009, Michael S. Zick wrote:
> On Fri June 5 2009, Harald Welte wrote:
> > On Thu, Jun 04, 2009 at 12:18:42PM -0500, Michael S. Zick wrote:
> > > > In any case, if somebody can ship me a system that exposes one of those
> > > > lock-ups, together with a pre-installed test case that exposes the problem
> > > > within let's say less than one day, plus the full kernel sources used in
> > > > that particular system: I'm happy to spend time to investigate the issue,
> > > > try to run the same test case on a VIA board, etc.
> > >
> > > I am about at my wits end with this Everex product -
> > >
> > > Give me a couple more weeks at the problem and if I haven't solved it;
> > > I'll give you this machine if you promise to update LKML with any fix.
> >
> > Well, I cannnot promise that I will actually find the problem and find a fix.
> > But I can promise to look into the issue and potentially ask other people
> > at VIA/Centaur to help with investigating it.
> >
> > And as soon as the problem can be reproduced, or we have a fix, or I give up,
> > you would get the system back, obviously.
> >
>
> I would not want it back, obviously.
>
> > I'll try to figure out how we can do the shipping while billing it to VIA's
> > account.
> >
>
> Collect?
>
> > In any case, let's wait until I hear from VIA if they can provide one of those
> > machines to me.
> >
>
> Just ask for an Everex CrapBook - they will know what you mean.
> The company went out of the NetBook business for a good reason.
>

Oh, I am within a hundred miles of offices of the company that
FIC sold the 75% interest in Everex to - I even recognize a
few of the officer's names - not that I have seen them for years.

Mike
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-06-05 09:00:51

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Michael S. Zick wrote:
> On Thu June 4 2009, Michael S. Zick wrote:
> > On Thu June 4 2009, Michael S. Zick wrote:
> > > On Thu June 4 2009, Michael S. Zick wrote:
> > > > On Thu June 4 2009, Harald Welte wrote:
> > > > > On Thu, Jun 04, 2009 at 11:21:28AM -0500, Michael S. Zick wrote:
> > > > >
> > > >
> > > > @Linus - -
> > > > The Debian/Ubuntu distribution kernels require irqpoll (2.6.28+) -
> > > > I took that out very early in my testing, when that problem got fixed -
> > > > I will also try putting that back in, it might be needed for
> > > > some of its side-effects on the processor/chipset.
> > > >
> > >
> > > It looks like I stopped using irqpoll on 4/10 (-rc1)
> > > http://forum.netbookuser.com/viewtopic.php?pid=6708#p6708
> > >
> > > Am re-testing the "virgin" kernel build with both command options now:
> > > http://forum.netbookuser.com/viewtopic.php?pid=7037#p7037
> > >
> > > Will post if it has less than 6 months of up-time. ;)
> > >
> >
> > About 6 months less than 6 months up-time ;)
> >
> > 2.6.30-rc8 - no special cmd line options: 1h45m
> > 2.6.30-rc8 - idle=poll irqpoll options: 1h08m
> >
>
> 2.6.30-rc8-ce1200v-09155 - uptime: 5 hours+, still running.
> Ship it.
>

New record: 9h15m

But that is bandaid'd enough I can stop with the Black Magic
and Computer Voodoo - -

Now just a matter of cutting each problem out of the herd
and solving each of them. . . ;)

A likely first candidate is this USB driver with its 28Mbyte
cry for help in the syslog file.

Mike
> Details:
> http://forum.netbookuser.com/viewtopic.php?pid=7039#p7039
>
> Mike
>
> > Summary:
> > Phooey!
> >
> > Mike
> >
> > > Mike
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at http://www.tux.org/lkml/
> > >
> > >
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> >
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-06-05 10:40:16

by Harald Welte

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

Hi Michael,

On Fri, Jun 05, 2009 at 02:27:40AM -0500, Michael S. Zick wrote:

> > As a side note: The C7-M actually uses the same software interface as Intel CPU
> > for doing the power transitions (ACPI Performance States). Only very early C7
> > models used the cpufreq/e_powersaver.c driver.
>
> What is "very early C7 models" in terms of cpuid results?

I've inquired about that.

Looking at he code, the driver only seems to checking for a minimum CPUID
value, but doesn't contain a maximum.

Regards,
--
- Harald Welte <[email protected]> http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

2009-06-05 11:10:18

by Harald Welte

[permalink] [raw]
Subject: VIA PowerSaver (Re: Linux 2.6.30-rc8 [also: VIA Support])

Hi Linus and Michael,

On Thu, Jun 04, 2009 at 10:46:00AM -0700, Linus Torvalds wrote:

> On Thu, 4 Jun 2009, Michael S. Zick wrote:
> >
> > Yes, I build test cases with and without - -
> > It was a fixed-speed kernel build that first hit the 4 hour up-time mark.
> > I just reposted that build today (the -09143lk).
> >
> > > Features like that easily put a huge stress on power regulators etc, if
> > > they result in sudden changes in current draw. Underspecced capacitors
> > > etc can cause CPU "brown-outs", which in turn can easily cause total
> > > failure.
> >
> > There is also a possible thermal issue with these machines - -
> > I doubt that VIA runs their qualification testing in bake ovens;
> > which is what NetBook cases amount too. ;)
>
> If the fixed-speed case runs for longer, it's not likely to be a thermal
> issue. The fixed speed case should be the higher-power one.
>
> So it can easily be a weak power setup (insufficient grounding, bad
> capacitors etc). But it could also be external bus issues, in case VIA
> power management also impact the external bus (eg "stopclock" like
> behavior on the CPU<->chipset bus).

I'm not intending to disagree with you, I just wanted to quote from
a not [yet] public document on the C7-M. This quote describes model A
(family 6, model 10(hex A), stepping 0-15):
===============
Enhanced PowerSaver technology allows the dynamic adjustment of the operating
frequency and operating voltage. The VIA C7-M can only change from the
highest supported performance state to the lowest supported performance state:
intermediate performance states are not guaranteed to work and are not offi-
cially supported. System software can use Enhanced PowerSaver to request the
sufficient amount of performance. Each individual performance state (P-State)
is described in the system bios according to 8.4.4 of the ACPI 3.0
specification.

The VIA C7-M processor incorporates two on-chip core clock PLLs. This allows
the processor to ping-pong between two frequencies instantaneously. In the
simplest scenario, where there are only two clock frequencies of interest and
no voltage changes, the transition can be instantaneous with no latency. In
more complex scenarios, where there are multiple clock frequencies of interest,
the "old" frequency can continue to be used while the new frequency is ramped
up. The transition is still instantaneous from a software point of view (code
still executes), but there is a latency associated with switching to the ramp-
ing "new" frequency.

VIA C7-M allows for a clean hardware approach to processor operating point
transitions. The transitions are performed instantaneously from a software and
functional point of view. Snoops and interrupts, for example, are unaffected by
transitions.
===============

A C7-M model D (family 6, model 13(hex D), stepping 0-15) has advanced performance
states, they use an inflection ratio, as well as adaptive-p-state control and
adaptive overclocking, as well as iteravie P-state transitions and adaptive
thermal control. I'm not yet aware of all the details, but have requested them.

In any case, the problems that have been reported by Michael were "Model A",
so those particular deatils shouldn't matter at this point.

Regards,
--
- Harald Welte <[email protected]> http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

2009-06-05 11:10:34

by Harald Welte

[permalink] [raw]
Subject: VIA CPU PCI cache line size (Re: Linux 2.6.30-rc8 [also: VIA Support])

On Thu, Jun 04, 2009 at 08:15:10PM -0400, Dave Jones wrote:

> I meant just doing something like this..
>
> (untested)
>
> I'm not sure if the clflush_size==0 case can happen, which is why
> I left the fallback. Maybe on ancient cpus?

this patch looks fine with me. I didn't do any testing yet either, though.

--
- Harald Welte <[email protected]> http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

2009-06-05 13:19:01

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Fri June 5 2009, Harald Welte wrote:
> Hi Michael,
>
> On Fri, Jun 05, 2009 at 02:27:40AM -0500, Michael S. Zick wrote:
>
> > > As a side note: The C7-M actually uses the same software interface as Intel CPU
> > > for doing the power transitions (ACPI Performance States). Only very early C7
> > > models used the cpufreq/e_powersaver.c driver.
> >
> > What is "very early C7 models" in terms of cpuid results?
>
> I've inquired about that.
>
> Looking at he code, the driver only seems to checking for a minimum CPUID
> value, but doesn't contain a maximum.
>

That is the way I read it also.

Mike
> Regards,

2009-06-05 13:41:53

by Michael S. Zick

[permalink] [raw]
Subject: Re: VIA PowerSaver (Re: Linux 2.6.30-rc8 [also: VIA Support])

On Fri June 5 2009, Harald Welte wrote:
> Hi Linus and Michael,
>
> On Thu, Jun 04, 2009 at 10:46:00AM -0700, Linus Torvalds wrote:
>
> > On Thu, 4 Jun 2009, Michael S. Zick wrote:
> > >
> > > Yes, I build test cases with and without - -
> > > It was a fixed-speed kernel build that first hit the 4 hour up-time mark.
> > > I just reposted that build today (the -09143lk).
> > >
> > > > Features like that easily put a huge stress on power regulators etc, if
> > > > they result in sudden changes in current draw. Underspecced capacitors
> > > > etc can cause CPU "brown-outs", which in turn can easily cause total
> > > > failure.
> > >
> > > There is also a possible thermal issue with these machines - -
> > > I doubt that VIA runs their qualification testing in bake ovens;
> > > which is what NetBook cases amount too. ;)
> >
> > If the fixed-speed case runs for longer, it's not likely to be a thermal
> > issue. The fixed speed case should be the higher-power one.
> >
> > So it can easily be a weak power setup (insufficient grounding, bad
> > capacitors etc). But it could also be external bus issues, in case VIA
> > power management also impact the external bus (eg "stopclock" like
> > behavior on the CPU<->chipset bus).
>
> I'm not intending to disagree with you, I just wanted to quote from
> a not [yet] public document on the C7-M. This quote describes model A
> (family 6, model 10(hex A), stepping 0-15):
> ===============
> Enhanced PowerSaver technology allows the dynamic adjustment of the operating
> frequency and operating voltage. The VIA C7-M can only change from the
> highest supported performance state to the lowest supported performance state:
> intermediate performance states are not guaranteed to work and are not offi-
> cially supported. System software can use Enhanced PowerSaver to request the
> sufficient amount of performance. Each individual performance state (P-State)
> is described in the system bios according to 8.4.4 of the ACPI 3.0
> specification.
>
> The VIA C7-M processor incorporates two on-chip core clock PLLs. This allows
> the processor to ping-pong between two frequencies instantaneously. In the
> simplest scenario, where there are only two clock frequencies of interest and
> no voltage changes, the transition can be instantaneous with no latency. In
> more complex scenarios, where there are multiple clock frequencies of interest,
> the "old" frequency can continue to be used while the new frequency is ramped
> up. The transition is still instantaneous from a software point of view (code
> still executes), but there is a latency associated with switching to the ramp-
> ing "new" frequency.
>

This appears to be what the e_powersaver is trying to do - -
It just needs to do it better. ;)

The current behavior ends up as 9 speeds, (I.E: 8 steps) of twice the FSB
frequency. I do show "stats/time_in_state" for all of them.

It is one heck of a system, It will be nice if I can make it work - -
If it has to be turned into a two-speed system, so be it.

= = = =

Note: This is one of VIA's claims to fame - and a good one - -
and different than what the cpufreq stuff was probably tested with.

Most other brands of CPU will "stall" (or some such) internally while
re-syncing the core clock chain (there-by "stalling" the code progress) -
The VIA processors do not - they keep on computing - something the
general code may not account for. ;)

Since I have a machine that is sensitive to this, I can test other
ways of doing something other than the two speed solution.

>
> VIA C7-M allows for a clean hardware approach to processor operating point
> transitions. The transitions are performed instantaneously from a software and
> functional point of view. Snoops and interrupts, for example, are unaffected by
> transitions.
> ===============
>
> A C7-M model D (family 6, model 13(hex D), stepping 0-15) has advanced performance
> states, they use an inflection ratio, as well as adaptive-p-state control and
> adaptive overclocking, as well as iteravie P-state transitions and adaptive
> thermal control. I'm not yet aware of all the details, but have requested them.
>
> In any case, the problems that have been reported by Michael were "Model A",
> so those particular deatils shouldn't matter at this point.
>

dmesg is reporting it as an "Model D" - I will check if that reporting is correct.
Tell the Silicon Grower's department "Thanks" for the recommendation of what to look for.

Mike
> Regards,

2009-06-05 13:44:00

by Michael S. Zick

[permalink] [raw]
Subject: Re: VIA CPU PCI cache line size (Re: Linux 2.6.30-rc8 [also: VIA Support])

On Fri June 5 2009, Harald Welte wrote:
> On Thu, Jun 04, 2009 at 08:15:10PM -0400, Dave Jones wrote:
>
> > I meant just doing something like this..
> >
> > (untested)
> >
> > I'm not sure if the clflush_size==0 case can happen, which is why
> > I left the fallback. Maybe on ancient cpus?
>
> this patch looks fine with me. I didn't do any testing yet either, though.
>

Although not coded as Dave J. suggests - I have been re-setting the
that value to the cpu cache line size for a week or two now.

It doesn't break anything in an obvious way - does seem to help.

Mike

2009-06-05 18:47:38

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8

On Tue June 2 2009, Linus Torvalds wrote:
>
> This is almost certainly the last -rc, and I debated even doing it. But
> rather than just do 2.6.30, I decided that I'd be better off doing a last
> -rc8 and then a real release probably this weekend.
>
> This has mostly driver and arch updates, with perhaps the intel drm/kms
> and network driver changes standing out, but there's powerpc, blackfin and
> arm updates too. Admittedly, the two biggest parts of the powerpc update
> are a revert and a defconfig update.
>
> A lot of small stuff, fixing a few regressions (and at least one bugzilla
> entry going back to 2.6.24). The small stuff does matter. Please test.
>
> Linus
>

Today's test build for VIA powered NetBooks is posted (as -09156):
Details at:
http://forum.netbookuser.com/viewtopic.php?pid=7043#p7043

Summary:
Yesterday's (last night's, this morning's) record holder (9+ hours)
plus the changes proposed by Dave Jones and Bob Copeland.

Mike

2009-06-06 12:17:55

by Michael S. Zick

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu June 4 2009, Linus Torvalds wrote:
>
> On Thu, 4 Jun 2009, Michael S. Zick wrote:
> >
> > Yes, I build test cases with and without - -
> > It was a fixed-speed kernel build that first hit the 4 hour up-time mark.
> > I just reposted that build today (the -09143lk).
> >
> > > Features like that easily put a huge stress on power regulators etc, if
> > > they result in sudden changes in current draw. Underspecced capacitors
> > > etc can cause CPU "brown-outs", which in turn can easily cause total
> > > failure.
> >
> > There is also a possible thermal issue with these machines - -
> > I doubt that VIA runs their qualification testing in bake ovens;
> > which is what NetBook cases amount too. ;)
>
> If the fixed-speed case runs for longer, it's not likely to be a thermal
> issue. The fixed speed case should be the higher-power one.
>

I can respond to that point now; VIA Tech has answered some of my questions -

The mainstream kernel, e_powersaver, is *under-clocking* my machine -

The cpuid instruction provides the minimum and maximum GSF values
(Guaranteed Stable Frequency) for that processor mask run -
Passing that on as the lower and upper limits to e_powersaver should
stop that problem. Will be testing this RSN.

Once you start operating the processor outside of the reported-by-silicon mask
limits - quote: "there is no quarantee of stable operation" - -

To that I add my opinion:
For a severely under-clocked machine (already possibly un-stable) - -
Thermal effects are almost certain to be present.

To paraphrase VIA Tech once again: "Don't do that." ;)

Mike

2009-06-06 13:29:22

by Harald Welte

[permalink] [raw]
Subject: e_powersaver / underclocking (was Re: Linux 2.6.30-rc8 [also: VIA Support])

On Sat, Jun 06, 2009 at 07:17:44AM -0500, Michael S. Zick wrote:

> I can respond to that point now; VIA Tech has answered some of my questions -
>
> The mainstream kernel, e_powersaver, is *under-clocking* my machine -
>
> The cpuid instruction provides the minimum and maximum GSF values
> (Guaranteed Stable Frequency) for that processor mask run -
> Passing that on as the lower and upper limits to e_powersaver should
> stop that problem. Will be testing this RSN.

It's really surprising to me that none of this seems to be handled correct so
far, I'll talk to Centaur and try to find out how we could have ended up in
this situation.

My assumption is that e_powersavre is no longer supposd to do any of those
low-level bits - rather the ACPI code is expected to get it right, hiding the
details from the OS. But in this case, there needs to be some run-time detection
whether the ACPI cpufreq should be used, or e_powersaver. And I don't see any
of that right now.

Also note that now with OLPC XO1.5 going for the C7-M (on a VX855 chipset,
though), many of those issues should soon receive much more attention -
especially on the power management front. And as you know, they don't use any
legacy BIOS...

Regards,
--
- Harald Welte <[email protected]> http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

2009-06-06 13:46:44

by Michael S. Zick

[permalink] [raw]
Subject: Re: e_powersaver / underclocking (was Re: Linux 2.6.30-rc8 [also: VIA Support])

On Sat June 6 2009, Harald Welte wrote:
> On Sat, Jun 06, 2009 at 07:17:44AM -0500, Michael S. Zick wrote:
>
> > I can respond to that point now; VIA Tech has answered some of my questions -
> >
> > The mainstream kernel, e_powersaver, is *under-clocking* my machine -
> >
> > The cpuid instruction provides the minimum and maximum GSF values
> > (Guaranteed Stable Frequency) for that processor mask run -
> > Passing that on as the lower and upper limits to e_powersaver should
> > stop that problem. Will be testing this RSN.
>
> It's really surprising to me that none of this seems to be handled correct so
> far, I'll talk to Centaur and try to find out how we could have ended up in
> this situation.
>

Ah, but we are talking here of the *second* NetBook ever produced.
If one is to believe the dmidecode output - it is using the VIA demo board
BIOS.

I bet the demo board BIOS is intended to demo the features of the product -
not the correctness or completeness of the ACPI support. ;)

If I where shipping demo boards - they would be demonstrating **my** product's
features. Maybe I am just projecting what I would do.


> My assumption is that e_powersavre is no longer supposd to do any of those
> low-level bits - rather the ACPI code is expected to get it right, hiding the
> details from the OS. But in this case, there needs to be some run-time detection
> whether the ACPI cpufreq should be used, or e_powersaver. And I don't see any
> of that right now.
>

I can keep my eyes open for a way to do that -
First, I want to get the machine running **with-in** the specs it can provide.
The one I have is running at 2/3rds of the reported *minimum* clockspeed.
I must have gotten a high quality "mask/process run" for it to be running at all.

> Also note that now with OLPC XO1.5 going for the C7-M (on a VX855 chipset,
> though), many of those issues should soon receive much more attention -
> especially on the power management front. And as you know, they don't use any
> legacy BIOS...
>

I'll keep my eyes open on that subject also when looking at the e_powersaver code -
The OLPC project will probably be requesting chip runs that **do** run at
the minimums the design is capable of and it will **have to** be stable for OLPC.

Mike
> Regards,

2009-06-06 13:57:48

by Michael S. Zick

[permalink] [raw]
Subject: Re: e_powersaver / underclocking (was Re: Linux 2.6.30-rc8 [also: VIA Support])

On Sat June 6 2009, Michael S. Zick wrote:
> On Sat June 6 2009, Harald Welte wrote:
> > On Sat, Jun 06, 2009 at 07:17:44AM -0500, Michael S. Zick wrote:
> >
> > > I can respond to that point now; VIA Tech has answered some of my questions -
> > >
> > > The mainstream kernel, e_powersaver, is *under-clocking* my machine -
> > >
> > > The cpuid instruction provides the minimum and maximum GSF values
> > > (Guaranteed Stable Frequency) for that processor mask run -
> > > Passing that on as the lower and upper limits to e_powersaver should
> > > stop that problem. Will be testing this RSN.
> >
> > It's really surprising to me that none of this seems to be handled correct so
> > far, I'll talk to Centaur and try to find out how we could have ended up in
> > this situation.
> >
>
> Ah, but we are talking here of the *second* NetBook ever produced.
> If one is to believe the dmidecode output - it is using the VIA demo board
> BIOS.
>
> I bet the demo board BIOS is intended to demo the features of the product -
> not the correctness or completeness of the ACPI support. ;)
>
> If I where shipping demo boards - they would be demonstrating **my** product's
> features. Maybe I am just projecting what I would do.
>
>
> > My assumption is that e_powersavre is no longer supposd to do any of those
> > low-level bits - rather the ACPI code is expected to get it right, hiding the
> > details from the OS. But in this case, there needs to be some run-time detection
> > whether the ACPI cpufreq should be used, or e_powersaver. And I don't see any
> > of that right now.
> >
>
> I can keep my eyes open for a way to do that -
> First, I want to get the machine running **with-in** the specs it can provide.
> The one I have is running at 2/3rds of the reported *minimum* clockspeed.
> I must have gotten a high quality "mask/process run" for it to be running at all.
>

If any readers have noticed, the links I post go to what was originally named
"Cloudbookuser" - the first forum for the second machine - -

Ever since day 3 or 4 - people have been writing scripts to change the on-demand
governor's minimum from 400Mhz to 600Mhz - without knowing "why" other than
it was required to keep the machine running (even on distribution kernels).

_Now_ I know why. Tell the VIA Tech folks "thanks" for the information.

Mike
> > Also note that now with OLPC XO1.5 going for the C7-M (on a VX855 chipset,
> > though), many of those issues should soon receive much more attention -
> > especially on the power management front. And as you know, they don't use any
> > legacy BIOS...
> >
>
> I'll keep my eyes open on that subject also when looking at the e_powersaver code -
> The OLPC project will probably be requesting chip runs that **do** run at
> the minimums the design is capable of and it will **have to** be stable for OLPC.
>
> Mike
> > Regards,
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-06-08 08:00:25

by Harald Welte

[permalink] [raw]
Subject: e_powersaver driver considered DANGEROUS (was Re: Linux 2.6.30-rc8 [also: VIA Support])

Dear Michael and others,

On Sat, Jun 06, 2009 at 08:56:12AM -0500, Michael S. Zick wrote:

> > > > The mainstream kernel, e_powersaver, is *under-clocking* my machine -
> > > >
> > > > The cpuid instruction provides the minimum and maximum GSF values
> > > > (Guaranteed Stable Frequency) for that processor mask run -
> > > > Passing that on as the lower and upper limits to e_powersaver should
> > > > stop that problem. Will be testing this RSN.
> > >
> > > It's really surprising to me that none of this seems to be handled correct so
> > > far, I'll talk to Centaur and try to find out how we could have ended up in
> > > this situation.
> > >
> >
> > Ah, but we are talking here of the *second* NetBook ever produced.
> > If one is to believe the dmidecode output - it is using the VIA demo board
> > BIOS.
> >
> > I bet the demo board BIOS is intended to demo the features of the product -
> > not the correctness or completeness of the ACPI support. ;)

So I've just been told the VIA reference BIOS has full support for the
processor p-state support. I'd therefore suppose every production BIOS
contains that code, too. The kernel should never use a native driver such as
e_powersaver on any C7 or Nano system, but use the ACPI provided
tables/methods, which are intel compatible. A native driver would only be
needed on really old C3 systems.

The e_powersaver.c driver neither respects the maximum/minimum frequency
constraints specified in the MSR's, nor does it take care of the inflection
ratio, parallax and other advanced stuff that C7 and Nano are doing in this
area.

I would consider the e_powersaver driver DANGEROUS on any C7 or Nano system,
i.e. on every system it supports. It is bound to cause system instability,
as it will most likely operate the CPU out of spec on virtually every system.

I'll do some actual testing on getting the intel-compatible ACPI method
for p-states working on my c7 and nano demo boards, and submit any patches
(if required).

I'll also meanwhile submit a patch to mark e_powersaver as DANGEROUS and
EXPERIMENTAL.

--
- Harald Welte <[email protected]> http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

2009-06-08 10:30:22

by Harald Welte

[permalink] [raw]
Subject: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states
using a MSR interface. The Linux driver just never made use of it, since in
addition to the check for the EST flag it also checked if the vendor is Intel.

Signed-off-by: Harald Welte <[email protected]>
---
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
index 208ecf6..ee03585 100644
--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -90,7 +90,8 @@ static int check_est_cpu(unsigned int cpuid)
{
struct cpuinfo_x86 *cpu = &cpu_data(cpuid);

- if (cpu->x86_vendor != X86_VENDOR_INTEL ||
+ if ((cpu->x86_vendor != X86_VENDOR_INTEL &&
+ cpu->x86_vendor != X86_VENDOR_CENTAUR) ||
!cpu_has(cpu, X86_FEATURE_EST))
return 0;

--
1.6.2.4

2009-06-08 13:16:32

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Mon, Jun 08, 2009 at 06:27:54PM +0800, Harald Welte wrote:
> The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states
> using a MSR interface. The Linux driver just never made use of it, since in
> addition to the check for the EST flag it also checked if the vendor is Intel.

Hm. Is the MSR interface compatible with the Intel one? I'm guessing so
based on the est flag being present. Maybe we should just drop the intel
check and only look for CPUs which claim est support.

--
Matthew Garrett | [email protected]

2009-06-08 14:25:34

by Michael S. Zick

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Mon June 8 2009, Harald Welte wrote:
> The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states
> using a MSR interface. The Linux driver just never made use of it, since in
> addition to the check for the EST flag it also checked if the vendor is Intel.
>

It looks like we should modify (conditional on ...MVIAC7 at build, model='d' runtime)
the acpi-cpufreq controls to deal properly with the Model-D adaptive controller.

What probably needs to be done is test if it has been set-up and turn it off
before handing control over to the ACPI - -
The information I have from VIA Tech. says not to use external controls while
the adaptive controller is enabled.

Perhaps the same sort of thing in the 'resume' path - some BIOS may be
enabling the adaptive controller.

Mike
> Signed-off-by: Harald Welte <[email protected]>
> ---
> arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> index 208ecf6..ee03585 100644
> --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> @@ -90,7 +90,8 @@ static int check_est_cpu(unsigned int cpuid)
> {
> struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
>
> - if (cpu->x86_vendor != X86_VENDOR_INTEL ||
> + if ((cpu->x86_vendor != X86_VENDOR_INTEL &&
> + cpu->x86_vendor != X86_VENDOR_CENTAUR) ||
> !cpu_has(cpu, X86_FEATURE_EST))
> return 0;
>

2009-06-08 14:34:50

by Pavel Machek

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu 2009-06-04 10:37:46, Michael S. Zick wrote:
> On Thu June 4 2009, you wrote:
> >
> > On Thu, 4 Jun 2009, Michael S. Zick wrote:
> > >
> > > Well, that one isn't anything to brag about"
> >
> > So what's the problem on that machine? Is the VIA C7-M (or some random
> > device in it) just buggy and it eventually hangs, or what? Has _any_
> > kernel ever worked on that machine for longer than an hour or two?
> >
>
> It is not (directly) the C7-M - it might be the chipset CX700.
>
> I have a small pool (5 or so actively reporting) testers -
>
> On the Everex Cloudbook (C7-M/CX700) I have gotten a record
> of 4h45 minutes of up-time on some, locally patched, builds.
> The same is reported on other Everex Cloudbooks.
>
> On the Sylvania gBook (also C7-M/CX700) the user reports
> 10s of minutes as their record up-time.
>
> H.W. got hold of one of the companies demo-boards but I
> haven't heard back from him yet on his experiences.
>
> On the HP-2133 (C7-M/CN896) the uptime is unknown - -
> only greater than 12 hours (it has never/ever deadlocked).

IIRC I did way longer tests than 12 hours on hp2133... it worked ok.

The other 'via from hell' machine would last overnight with cca 50%
probability.

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-06-08 14:42:08

by Pavel Machek

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Thu 2009-06-04 10:07:37, Linus Torvalds wrote:
>
>
> Side note: is it more stable if you disable the VIA speedstep thing
> (whatever it's called (ok, google tells me it's called "TwinTurbo" and
> "Advanced PowerSaver")?
>
> Features like that easily put a huge stress on power regulators etc, if
> they result in sudden changes in current draw. Underspecced capacitors
> etc can cause CPU "brown-outs", which in turn can easily cause total
> failure.

If you try that, do idle=poll too. Hides class of bugs related to deep
idle states.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-06-08 14:58:16

by Pavel Machek

[permalink] [raw]
Subject: Re: e_powersaver driver considered DANGEROUS (was Re: Linux 2.6.30-rc8 [also: VIA Support])

Hi!

> > > Ah, but we are talking here of the *second* NetBook ever produced.
> > > If one is to believe the dmidecode output - it is using the VIA demo board
> > > BIOS.
> > >
> > > I bet the demo board BIOS is intended to demo the features of the product -
> > > not the correctness or completeness of the ACPI support. ;)
>
> So I've just been told the VIA reference BIOS has full support for the
> processor p-state support. I'd therefore suppose every production BIOS
> contains that code, too. The kernel should never use a native driver such as
> e_powersaver on any C7 or Nano system, but use the ACPI provided
> tables/methods, which are intel compatible. A native driver would only be
> needed on really old C3 systems.
>
> The e_powersaver.c driver neither respects the maximum/minimum frequency
> constraints specified in the MSR's, nor does it take care of the inflection
> ratio, parallax and other advanced stuff that C7 and Nano are doing in this
> area.

Uhuh, what is inflection ratio/parallax?

I did play with cpufreq on intel/amd systems, and never heard about
those...

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-06-08 14:58:29

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Mon, Jun 08, 2009 at 09:25:09AM -0500, Michael S. Zick wrote:
> On Mon June 8 2009, Harald Welte wrote:
> > The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states
> > using a MSR interface. The Linux driver just never made use of it, since in
> > addition to the check for the EST flag it also checked if the vendor is Intel.
> >
>
> It looks like we should modify (conditional on ...MVIAC7 at build, model='d' runtime)
> the acpi-cpufreq controls to deal properly with the Model-D adaptive controller.

Can't make it build-time dependent - distribution kernels may not
explicitly support the C7. It's valid to have a vendor=centaur
conditional that turns off any adaptive control if appropriate ACPI
methods are present.

--
Matthew Garrett | [email protected]

2009-06-08 15:08:18

by Michael S. Zick

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Mon June 8 2009, Matthew Garrett wrote:
> On Mon, Jun 08, 2009 at 09:25:09AM -0500, Michael S. Zick wrote:
> > On Mon June 8 2009, Harald Welte wrote:
> > > The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states
> > > using a MSR interface. The Linux driver just never made use of it, since in
> > > addition to the check for the EST flag it also checked if the vendor is Intel.
> > >
> >
> > It looks like we should modify (conditional on ...MVIAC7 at build, model='d' runtime)
> > the acpi-cpufreq controls to deal properly with the Model-D adaptive controller.
>
> Can't make it build-time dependent - distribution kernels may not
> explicitly support the C7. It's valid to have a vendor=centaur
> conditional that turns off any adaptive control if appropriate ACPI
> methods are present.
>

A valid point.

I haven't looked yet, but I think we have advanced to the point where the
'VIA hack' for cache-line size can also go away.
Now that the pci-cache-line-size setting is being done differently.
(currently a proposed change).

The C7(xxxx) is 99 44/100% a Pentium-M with minor differences.
(Like the power/thermal/freq adaptive controller on the "D" models.)

Mike

2009-06-08 18:10:24

by Harald Welte

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Mon, Jun 08, 2009 at 02:16:20PM +0100, Matthew Garrett wrote:
> On Mon, Jun 08, 2009 at 06:27:54PM +0800, Harald Welte wrote:
> > The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states
> > using a MSR interface. The Linux driver just never made use of it, since in
> > addition to the check for the EST flag it also checked if the vendor is Intel.
>
> Hm. Is the MSR interface compatible with the Intel one?

Yes, it is intended to be. If you know of any public intel spec on the actual
MSR values, I'd be happy to have a look and compare it with Centaur's specs.

VIA/Centaur specs are being prepared for public release, but it will probably
take longer than we can responsibly delay fixing this bug.

> I'm guessing so based on the est flag being present. Maybe we should just
> drop the intel check and only look for CPUs which claim est support.

I think that might be the right decision.

--
- Harald Welte <[email protected]> http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

2009-06-08 18:36:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs



On Mon, 8 Jun 2009, Harald Welte wrote:
>
> The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states
> using a MSR interface. The Linux driver just never made use of it, since in
> addition to the check for the EST flag it also checked if the vendor is Intel.
>
> Signed-off-by: Harald Welte <[email protected]>
> ---
> arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> index 208ecf6..ee03585 100644
> --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> @@ -90,7 +90,8 @@ static int check_est_cpu(unsigned int cpuid)
> {
> struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
>
> - if (cpu->x86_vendor != X86_VENDOR_INTEL ||
> + if ((cpu->x86_vendor != X86_VENDOR_INTEL &&
> + cpu->x86_vendor != X86_VENDOR_CENTAUR) ||
> !cpu_has(cpu, X86_FEATURE_EST))

Hmm. This all really should be just

static int check_est_cpu(unsigned int cpuid)
{
struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
return cpu_has(cpu, X86_FEATURE_EST);
}

I suspect, with no vendor tests. That's the whole _point_ of CPU features,
after all.

If some vendor claims EST but doesn't actually support the EST interfaces,
we should just have fixups to clear the bit in the per-vendor cpuinfo
code, not in some random driver.

The only thing that makes me nervous about this is how close to 2.6.30 we
are. I'd be happier if this was resolved by doing this as a patch
post-2.6.30, and then adding '[email protected]' as a Cc: tag, and
backporting it to 2.6.30.1 if no problems appear.

It's not like this is a regression, I think.

Does that sound like a reasonable plan?

Linus

2009-06-08 18:41:51

by Michael S. Zick

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Mon June 8 2009, Linus Torvalds wrote:
>
> On Mon, 8 Jun 2009, Harald Welte wrote:
> >
> > The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states
> > using a MSR interface. The Linux driver just never made use of it, since in
> > addition to the check for the EST flag it also checked if the vendor is Intel.
> >
> > Signed-off-by: Harald Welte <[email protected]>
> > ---
> > arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 3 ++-
> > 1 files changed, 2 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > index 208ecf6..ee03585 100644
> > --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > @@ -90,7 +90,8 @@ static int check_est_cpu(unsigned int cpuid)
> > {
> > struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
> >
> > - if (cpu->x86_vendor != X86_VENDOR_INTEL ||
> > + if ((cpu->x86_vendor != X86_VENDOR_INTEL &&
> > + cpu->x86_vendor != X86_VENDOR_CENTAUR) ||
> > !cpu_has(cpu, X86_FEATURE_EST))
>
> Hmm. This all really should be just
>
> static int check_est_cpu(unsigned int cpuid)
> {
> struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
> return cpu_has(cpu, X86_FEATURE_EST);
> }
>
> I suspect, with no vendor tests. That's the whole _point_ of CPU features,
> after all.
>
> If some vendor claims EST but doesn't actually support the EST interfaces,
> we should just have fixups to clear the bit in the per-vendor cpuinfo
> code, not in some random driver.
>
> The only thing that makes me nervous about this is how close to 2.6.30 we
> are. I'd be happier if this was resolved by doing this as a patch
> post-2.6.30, and then adding '[email protected]' as a Cc: tag, and
> backporting it to 2.6.30.1 if no problems appear.
>
> It's not like this is a regression, I think.
>
> Does that sound like a reasonable plan?
>

Sounds like a plan to me - it has been broke a long time - a little longer...
I can continue to carry any change we code up as a local patch until it
gets done at the mainline level.
There are only about 20,000 of these old Everex machines ever produced.

Mike
> Linus
>
>

2009-06-08 20:03:39

by Michael S. Zick

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Mon June 8 2009, Harald Welte wrote:
> The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states
> using a MSR interface. The Linux driver just never made use of it, since in
> addition to the check for the EST flag it also checked if the vendor is Intel.
>
> Signed-off-by: Harald Welte <[email protected]>
> ---
> arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> index 208ecf6..ee03585 100644
> --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> @@ -90,7 +90,8 @@ static int check_est_cpu(unsigned int cpuid)
> {
> struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
>
> - if (cpu->x86_vendor != X86_VENDOR_INTEL ||
> + if ((cpu->x86_vendor != X86_VENDOR_INTEL &&
> + cpu->x86_vendor != X86_VENDOR_CENTAUR) ||
> !cpu_has(cpu, X86_FEATURE_EST))
> return 0;
>

For my own internal testing, based on subsequent posts, I simplified
that to:

static int check_est_cpu(unsigned int cpuid)
{
struct cpuinfo_x86 *cpu = &cpu_data(cpuid);

if (!cpu_has(cpu, X86_FEATURE_EST))
return 0;

return 1;
}

Have not done anything (yet) to deal with the adaptive controller,
**which is safe to (not) do on my test system because I know it isn't enabled**

Note: This machine is known to run common, proprietary operating systems,
so I am also assuming its BIOS ACPI is complete enough.

Note 2: Don't try this unless you have a machine to waste. ;)

Mike

2009-06-08 21:15:34

by Michael S. Zick

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Mon June 8 2009, Michael S. Zick wrote:
> On Mon June 8 2009, Harald Welte wrote:
> > The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states
> > using a MSR interface. The Linux driver just never made use of it, since in
> > addition to the check for the EST flag it also checked if the vendor is Intel.
> >
> > Signed-off-by: Harald Welte <[email protected]>
> > ---
> > arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 3 ++-
> > 1 files changed, 2 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > index 208ecf6..ee03585 100644
> > --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > @@ -90,7 +90,8 @@ static int check_est_cpu(unsigned int cpuid)
> > {
> > struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
> >
> > - if (cpu->x86_vendor != X86_VENDOR_INTEL ||
> > + if ((cpu->x86_vendor != X86_VENDOR_INTEL &&
> > + cpu->x86_vendor != X86_VENDOR_CENTAUR) ||
> > !cpu_has(cpu, X86_FEATURE_EST))
> > return 0;
> >
>
> For my own internal testing, based on subsequent posts, I simplified
> that to:
>
> static int check_est_cpu(unsigned int cpuid)
> {
> struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
>
> if (!cpu_has(cpu, X86_FEATURE_EST))
> return 0;
>
> return 1;
> }
>
> Have not done anything (yet) to deal with the adaptive controller,
> **which is safe to (not) do on my test system because I know it isn't enabled**
>
> Note: This machine is known to run common, proprietary operating systems,
> so I am also assuming its BIOS ACPI is complete enough.
>
> Note 2: Don't try this unless you have a machine to waste. ;)
>

Phooey, close but no cigar - - -

root@cb01:/sys/devices/system/cpu/cpu0/cpufreq# cat scaling_available_frequencies
1200000 1000000 800000 600000 400000
root@cb01:/sys/devices/system/cpu/cpu0/cpufreq# cat stats/time_in_state
1200000 3281
1000000 120
800000 130
600000 234
400000 52980
root@cb01:/sys/devices/system/cpu/cpu0/cpufreq

It loads, it appears to work, but that lowest 400Mhz has ACPI entries
even though this is a 600Mhz..1200Mhz chip.

Should be fixable -
The code looks like it is "asking" the BIOS for the table ranges -
It needs to sanity check against the limits etched in the silicon for
this particular [mask/process run].

I am thinking, maybe put the min/max GSF into cpu_boot_data.
Will probably need it there to fix e_powersaver anyway.

Mike
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-06-08 21:32:28

by Michael S. Zick

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Mon June 8 2009, Linus Torvalds wrote:
>
> On Mon, 8 Jun 2009, Harald Welte wrote:
> >
> > The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states
> > using a MSR interface. The Linux driver just never made use of it, since in
> > addition to the check for the EST flag it also checked if the vendor is Intel.
> >
> > Signed-off-by: Harald Welte <[email protected]>
> > ---
> > arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 3 ++-
> > 1 files changed, 2 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > index 208ecf6..ee03585 100644
> > --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > @@ -90,7 +90,8 @@ static int check_est_cpu(unsigned int cpuid)
> > {
> > struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
> >
> > - if (cpu->x86_vendor != X86_VENDOR_INTEL ||
> > + if ((cpu->x86_vendor != X86_VENDOR_INTEL &&
> > + cpu->x86_vendor != X86_VENDOR_CENTAUR) ||
> > !cpu_has(cpu, X86_FEATURE_EST))
>
> Hmm. This all really should be just
>
> static int check_est_cpu(unsigned int cpuid)
> {
> struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
> return cpu_has(cpu, X86_FEATURE_EST);
> }
>
> I suspect, with no vendor tests. That's the whole _point_ of CPU features,
> after all.
>
> If some vendor claims EST but doesn't actually support the EST interfaces,
> we should just have fixups to clear the bit in the per-vendor cpuinfo
> code, not in some random driver.
>

Sounds like a plan to me - -

Shall we define a kernel 'cpu-feature' for the internal, adaptive
thermal/power/freq controller?

There are at least two cpufreq drivers that need to be able to check
for the feature
_and_
perhaps the clock/timing routines would need to know if it was there/enabled -

Since it is internal, on-chip, it will not be sending any notifications as
it adapts the core clock speed. (so much for 'loops/xSec').

Knowing that it is present and enabled would let the timing routines
establish non-loop counting measurements.

Mike
> The only thing that makes me nervous about this is how close to 2.6.30 we
> are. I'd be happier if this was resolved by doing this as a patch
> post-2.6.30, and then adding '[email protected]' as a Cc: tag, and
> backporting it to 2.6.30.1 if no problems appear.
>
> It's not like this is a regression, I think.
>
> Does that sound like a reasonable plan?
>
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-06-08 23:48:42

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Mon, Jun 08, 2009 at 04:15:22PM -0500, Michael S. Zick wrote:

> Phooey, close but no cigar - - -
>
> root@cb01:/sys/devices/system/cpu/cpu0/cpufreq# cat scaling_available_frequencies
> 1200000 1000000 800000 600000 400000
> root@cb01:/sys/devices/system/cpu/cpu0/cpufreq# cat stats/time_in_state
> 1200000 3281
> 1000000 120
> 800000 130
> 600000 234
> 400000 52980
> root@cb01:/sys/devices/system/cpu/cpu0/cpufreq
>
> It loads, it appears to work, but that lowest 400Mhz has ACPI entries
> even though this is a 600Mhz..1200Mhz chip.

If the BIOS tables claim it then it's likely Windows uses it on the same
hardware. What's the stability like with the ACPI code?

--
Matthew Garrett | [email protected]

2009-06-09 02:20:26

by Harald Welte

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Mon, Jun 08, 2009 at 11:35:12AM -0700, Linus Torvalds wrote:
> Hmm. This all really should be just
>
> static int check_est_cpu(unsigned int cpuid)
> {
> struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
> return cpu_has(cpu, X86_FEATURE_EST);
> }
>
> I suspect, with no vendor tests. That's the whole _point_ of CPU features,
> after all.

That's what I was thinking, too. If there was no such vendor test, it would
have worked ever since the code was written (the C7 is by far not a new
component, it's around for years).

> If some vendor claims EST but doesn't actually support the EST interfaces,
> we should just have fixups to clear the bit in the per-vendor cpuinfo
> code, not in some random driver.

agreed.

> The only thing that makes me nervous about this is how close to 2.6.30 we
> are. I'd be happier if this was resolved by doing this as a patch
> post-2.6.30, and then adding '[email protected]' as a Cc: tag, and
> backporting it to 2.6.30.1 if no problems appear.
>
> It's not like this is a regression, I think.
>
> Does that sound like a reasonable plan?

Sounds fine with me. But what I would definitely suggest merging before 2.6.30
is the marking e_powersaver EXPERIMENTAL + DANGEROUS patch.

Regards,
--
- Harald Welte <[email protected]> http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

2009-06-09 08:20:31

by Harald Welte

[permalink] [raw]
Subject: Re: Linux 2.6.30-rc8 [also: VIA Support]

On Sat, Jun 06, 2009 at 07:17:44AM -0500, Michael S. Zick wrote:
> The mainstream kernel, e_powersaver, is *under-clocking* my machine -
>
> The cpuid instruction provides the minimum and maximum GSF values
> (Guaranteed Stable Frequency) for that processor mask run -
> Passing that on as the lower and upper limits to e_powersaver should
> stop that problem. Will be testing this RSN.

Can you provide more information on where you think you are getting the GSF
values from? I am not aware of such information.

The internal documentation always only refres to the minimum VID/FID values
from the MSR. Also, according to the specifications of all C7 models listed,
there are some that require a minimum of 400MHz, others require a minimum of
800. I have not found any that would require a minimum of 600MHz.

So I don't really know if your system really operates outside any specified
range.

The only thing I know of that you can read from the CPUID vendor string is
the maximum clock frequency in ASCII.

So it seems, e_powersaver is not really at all as dangerous as it appeared!

--
- Harald Welte <[email protected]> http://linux.via.com.tw/
============================================================================
VIA Free and Open Source Software Liaison

2009-06-09 12:26:32

by Michael S. Zick

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Mon June 8 2009, Harald Welte wrote:
> On Mon, Jun 08, 2009 at 11:35:12AM -0700, Linus Torvalds wrote:
> > Hmm. This all really should be just
> >
> > static int check_est_cpu(unsigned int cpuid)
> > {
> > struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
> > return cpu_has(cpu, X86_FEATURE_EST);
> > }
> >
> > I suspect, with no vendor tests. That's the whole _point_ of CPU features,
> > after all.
>
> That's what I was thinking, too. If there was no such vendor test, it would
> have worked ever since the code was written (the C7 is by far not a new
> component, it's around for years).
>
> > If some vendor claims EST but doesn't actually support the EST interfaces,
> > we should just have fixups to clear the bit in the per-vendor cpuinfo
> > code, not in some random driver.
>
> agreed.
>
> > The only thing that makes me nervous about this is how close to 2.6.30 we
> > are. I'd be happier if this was resolved by doing this as a patch
> > post-2.6.30, and then adding '[email protected]' as a Cc: tag, and
> > backporting it to 2.6.30.1 if no problems appear.
> >
> > It's not like this is a regression, I think.
> >
> > Does that sound like a reasonable plan?
>
> Sounds fine with me. But what I would definitely suggest merging before 2.6.30
> is the marking e_powersaver EXPERIMENTAL + DANGEROUS patch.
>

As posted somewhere in this thread,
the acpi-cpufreq controller appears to work on my machine in initial tests.

A few tests on only one machine is not much to go on and it
isn't enough to call it "tested" but is forward progress.

I was only risking my "throw-away" machine yesterday, will see what
happens on the "good" one today (the HP-2133).

@H.W. - are you running any of these changes on your HP-2133?

Mike
> Regards,

2009-06-09 12:37:03

by Michael S. Zick

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Mon June 8 2009, Matthew Garrett wrote:
> On Mon, Jun 08, 2009 at 04:15:22PM -0500, Michael S. Zick wrote:
>
> > Phooey, close but no cigar - - -
> >
> > root@cb01:/sys/devices/system/cpu/cpu0/cpufreq# cat scaling_available_frequencies
> > 1200000 1000000 800000 600000 400000
> > root@cb01:/sys/devices/system/cpu/cpu0/cpufreq# cat stats/time_in_state
> > 1200000 3281
> > 1000000 120
> > 800000 130
> > 600000 234
> > 400000 52980
> > root@cb01:/sys/devices/system/cpu/cpu0/cpufreq
> >
> > It loads, it appears to work, but that lowest 400Mhz has ACPI entries
> > even though this is a 600Mhz..1200Mhz chip.
>
> If the BIOS tables claim it then it's likely Windows uses it on the same
> hardware. What's the stability like with the ACPI code?
>

Too soon to say - those where first results (about par with "it boots"). ;)

Will be poking at this machine more today -

* * * *

I do not have a copy of any proprietary operating system to compare with
on this machine.

You could use the search features of this forum:
http://forum.netbookuser.com/index.php
to look for windows problem reports/discussions.

That forum was active since the machine was announced and included
members from Everex. If there where windows problems, they are
in there somewhere.

Mike

2009-06-09 16:00:56

by Michael S. Zick

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Tue June 9 2009, Michael S. Zick wrote:
> On Mon June 8 2009, Matthew Garrett wrote:
> > On Mon, Jun 08, 2009 at 04:15:22PM -0500, Michael S. Zick wrote:
> >
> > > Phooey, close but no cigar - - -
> > >
> > > root@cb01:/sys/devices/system/cpu/cpu0/cpufreq# cat scaling_available_frequencies
> > > 1200000 1000000 800000 600000 400000
> > > root@cb01:/sys/devices/system/cpu/cpu0/cpufreq# cat stats/time_in_state
> > > 1200000 3281
> > > 1000000 120
> > > 800000 130
> > > 600000 234
> > > 400000 52980
> > > root@cb01:/sys/devices/system/cpu/cpu0/cpufreq
> > >
> > > It loads, it appears to work, but that lowest 400Mhz has ACPI entries
> > > even though this is a 600Mhz..1200Mhz chip.
> >
> > If the BIOS tables claim it then it's likely Windows uses it on the same
> > hardware. What's the stability like with the ACPI code?
> >
>
> Too soon to say - those where first results (about par with "it boots"). ;)
>
> Will be poking at this machine more today -
>

An interesting note - not sure if it is cosmetic or an indication
of something significant [sub-title: computer voodoo]:

You see above that acpi-cpufreq has the frequencies listed in
decreasing order - -
The e_powersaver module lists them in increasing order. [????]

Mike
> * * * *
>
> I do not have a copy of any proprietary operating system to compare with
> on this machine.
>
> You could use the search features of this forum:
> http://forum.netbookuser.com/index.php
> to look for windows problem reports/discussions.
>
> That forum was active since the machine was announced and included
> members from Everex. If there where windows problems, they are
> in there somewhere.
>
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-06-09 16:23:56

by Chuck Ebbert

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Tue, 9 Jun 2009 07:26:11 -0500
"Michael S. Zick" <[email protected]> wrote:

> > Sounds fine with me. But what I would definitely suggest merging before 2.6.30
> > is the marking e_powersaver EXPERIMENTAL + DANGEROUS patch.
> >
>
> As posted somewhere in this thread,
> the acpi-cpufreq controller appears to work on my machine in initial tests.

It works great on the Samsung NC20 with an x86_64 kernel.

2009-06-09 16:45:40

by Michael S. Zick

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Tue June 9 2009, Chuck Ebbert wrote:
> On Tue, 9 Jun 2009 07:26:11 -0500
> "Michael S. Zick" <[email protected]> wrote:
>
> > > Sounds fine with me. But what I would definitely suggest merging before 2.6.30
> > > is the marking e_powersaver EXPERIMENTAL + DANGEROUS patch.
> > >
> >
> > As posted somewhere in this thread,
> > the acpi-cpufreq controller appears to work on my machine in initial tests.
>
> It works great on the Samsung NC20 with an x86_64 kernel.
>

For non-NetBook aware readers, that means the:
VIA Nano U2250 processor with the VX800 system chipset.

Mike

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-06-09 17:52:17

by Michael S. Zick

[permalink] [raw]
Subject: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Tue June 9 2009, Michael S. Zick wrote:
> On Mon June 8 2009, Matthew Garrett wrote:
> > On Mon, Jun 08, 2009 at 04:15:22PM -0500, Michael S. Zick wrote:
> >
> > > Phooey, close but no cigar - - -
> > >
> > > root@cb01:/sys/devices/system/cpu/cpu0/cpufreq# cat scaling_available_frequencies
> > > 1200000 1000000 800000 600000 400000
> > > root@cb01:/sys/devices/system/cpu/cpu0/cpufreq# cat stats/time_in_state
> > > 1200000 3281
> > > 1000000 120
> > > 800000 130
> > > 600000 234
> > > 400000 52980
> > > root@cb01:/sys/devices/system/cpu/cpu0/cpufreq
> > >
> > > It loads, it appears to work, but that lowest 400Mhz has ACPI entries
> > > even though this is a 600Mhz..1200Mhz chip.
> >
> > If the BIOS tables claim it then it's likely Windows uses it on the same
> > hardware. What's the stability like with the ACPI code?
> >
>
> Too soon to say - those where first results (about par with "it boots"). ;)
>
> Will be poking at this machine more today -
>

Naw - e_powersaver does a better job and consistently provides
nearly an order of magnitude better up-time.

Guess I'll go back to reading/auditing e_powersaver. ;)

Mike
> * * * *
>
> I do not have a copy of any proprietary operating system to compare with
> on this machine.
>
> You could use the search features of this forum:
> http://forum.netbookuser.com/index.php
> to look for windows problem reports/discussions.
>
> That forum was active since the machine was announced and included
> members from Everex. If there where windows problems, they are
> in there somewhere.
>
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-06-15 13:10:27

by Michael S. Zick

[permalink] [raw]
Subject: TSC features, was: Re: [PATCH 1/2] CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs

On Mon June 8 2009, Linus Torvalds wrote:
>
> On Mon, 8 Jun 2009, Harald Welte wrote:
> >
> > The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states
> > using a MSR interface. The Linux driver just never made use of it, since in
> > addition to the check for the EST flag it also checked if the vendor is Intel.
> >
> > Signed-off-by: Harald Welte <[email protected]>
> > ---
> > arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 3 ++-
> > 1 files changed, 2 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > index 208ecf6..ee03585 100644
> > --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > @@ -90,7 +90,8 @@ static int check_est_cpu(unsigned int cpuid)
> > {
> > struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
> >
> > - if (cpu->x86_vendor != X86_VENDOR_INTEL ||
> > + if ((cpu->x86_vendor != X86_VENDOR_INTEL &&
> > + cpu->x86_vendor != X86_VENDOR_CENTAUR) ||
> > !cpu_has(cpu, X86_FEATURE_EST))
>
> Hmm. This all really should be just
>
> static int check_est_cpu(unsigned int cpuid)
> {
> struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
> return cpu_has(cpu, X86_FEATURE_EST);
> }
>
> I suspect, with no vendor tests. That's the whole _point_ of CPU features,
> after all.
>

Following that same logic, shouldn't this chunk be based on CPU features also?
(Spotted while tracking down why the non-stop TSC wasn't being used on VIA):

#if defined (CONFIG_GENERIC_TIME) && defined (CONFIG_X86)
static void tsc_check_state(int state)
{
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_AMD:
case X86_VENDOR_INTEL:
/*
* AMD Fam10h TSC will tick in all
* C/P/S0/S1 states when this bit is set.
*/
if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
return;

/*FALL THROUGH*/
default:
/* TSC could halt in idle, so notify users */
if (state > ACPI_STATE_C1)
mark_tsc_unstable("TSC halts in idle");
}
}
#else
static void tsc_check_state(int state) { return; }
#endif

Mike

2009-06-15 14:25:40

by Michael S. Zick

[permalink] [raw]
Subject: [PATCH, RFC] Re: TSC features, ...

On Mon June 15 2009, Michael S. Zick wrote:
> On Mon June 8 2009, Linus Torvalds wrote:
> >
> > Hmm. This all really should be just
> >
> > static int check_est_cpu(unsigned int cpuid)
> > {
> > struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
> > return cpu_has(cpu, X86_FEATURE_EST);
> > }
> >
> > I suspect, with no vendor tests. That's the whole _point_ of CPU features,
> > after all.
> >
>

Like this;
Let the vendor's cpu setup code be responsible for getting the
feature flag correct:

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 10a2d91..daea6a9 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -244,22 +244,12 @@ int acpi_processor_resume(struct acpi_device * device)
#if defined (CONFIG_GENERIC_TIME) && defined (CONFIG_X86)
static void tsc_check_state(int state)
{
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_AMD:
- case X86_VENDOR_INTEL:
- /*
- * AMD Fam10h TSC will tick in all
- * C/P/S0/S1 states when this bit is set.
- */
- if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
- return;
+ if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
+ return;

- /*FALL THROUGH*/
- default:
- /* TSC could halt in idle, so notify users */
- if (state > ACPI_STATE_C1)
- mark_tsc_unstable("TSC halts in idle");
- }
+ /* TSC could halt in idle, so notify users */
+ if (state > ACPI_STATE_C1)
+ mark_tsc_unstable("TSC halts in idle");
}
#else
static void tsc_check_state(int state) { return; }

Mike