login
Header Space

 
 

sata/ide timeout errors on asus server-mb

August 21, 2008 - 12:11pm
Submitted by mp1 on August 21, 2008 - 12:11pm.
Linux

Hi everyone,

I had trouble with my Asus Mainboard (P5BV-C/4L, Intel i3200 Chipset, ICH7+Marvell Controllers) with all the recent kernel versions (although I can't tell if the problem existet with older kernels since it's a system I set up just recently). I got a lot of timeout errors and bus-resets on both the ide and the sata ports (more precisely on the ICH7 sata, Marvell 88SE6145 sata, the promise Ultra100 TX2 ide and the JMicron 20360/20363 sata and ide ports). The problem occured less often when I didn't let the drives go to standby, but it nontheless persisted.

Following the recommomendations on I added the pci=nomsi kernelparameter. This solved the problem for me. At least I got no futher errors since two days. So far I haven't experienced any performance penalties - as far as I understand the MSI thing I should'nt right? Well at least it looks like Asus didn't do their homework BIOS-wise.

I appended my dmesg and lspci output. I'm hoping this post is of help for anyone expieriencing the same kind of troubles.

Best regards, Matthias P.

AttachmentSize
sys-infos.txt41.72 KB

still not solved

September 23, 2008 - 6:24am

Looks like I was a little too hasty - the problem was not solved by pci=nomsi. I still get these timeout errors. I'm going to try the other options (noapic and acpi=off) in the hope they provide a workaround. But even if they do, I'm not too happy about it since this does not really solve the bug.

Linux kernel does not support all devices and never will

September 26, 2008 - 6:03pm

Certain companies still are not very cooperative in releasing their so-called "intellectual property" that would allow good Linux kernel modules to be written for their devices. The #1 offender would be a company named "Marvell". However, when purchasing motherboards, (for example, from newegg.com), one can click the "Specifications" tab and find that a particular Asus motherboard contains a Marvell SATA or NIC. So the decision to purchase such a motherboard without doing your homework and then trying to deploy a "mission critical" solution such as a web or database server on it would actually reflect rather badly on the person making said hardware decision. And if said person decided to run a down-level but "mainstream" distro such as RHEL or Suse (to keep the boss "happy"), then a glass mirror is where we can find the "problem".

I'm sorry, I didn't know

October 13, 2008 - 7:44am

I'm sorry, I didn't know Marvell had such a crappy policy. Next time I buy an MB I will account for this.

Nontheless I don't apreciate your tone there. I wasn't putting blame on anyone - just asking questions. And just so you know - this is no equipment for some buisness company, I baught it for my own personal use at home. Neither is the distro RHEL or Suse - me I'm using gentoo here (albeit I do use their patchset for the kernel - the gentoo-sources one).

Back to the topic, these problems don't just occur on the discs connected to the marvell controller. They are also present on the Intel ICH7 onboard (in AHCI mode) and the addon PCI-card Promise IDE controller. This is why I don't think this is a problem particular to the marvell chip. My guess is it's something in the BIOS, which isn't behaving according to specs. Or it's something with the new sata/ide layer in the kernel. On my old machine the promise controller did not show these kind of problems using the old ide drivers. I myself don't have sufficient knowledge to analyze this further, but I'm more than willing to take hints and suggestions on how to debug this.

Have a same problem here.

September 23, 2008 - 9:11pm
Bojan Popovic (not verified)

Have a same problem here. ASUS again. MB is ASUS P5KC based on Intel's P35 chipset, uses ICH9 north bridge, JMicron 20360/20363 SATA controlers.

The error I keep getting looks like this:

ata1: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xa frozen
ata1: irq_stat 0x00000040, connection status changed
ata1: SError: { DevExch }
ata1: hard resetting link
ata1: SATA link down (SStatus 0 SControl 300)
ata1: EH complete

Hopefully you just solved my problem. ;)

further news

September 26, 2008 - 10:09am

Well looks like even the combined force of all three options does not prevent the timeouts :(
My current way of checking for the problem is to wait for the drives to spin down and then issue a "smartctl -a /dev/sdX" command, which brings up a timeout in dmesg. I don't know what more I can do - is there anyone who can give me some clues? One idea which came to mind was to somehow increase the timeout time, so the drives have more time to spin up, but I lack the knowledge of where to change this (somewhere in /proc/...? ).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
speck-geostationary