Another WD failure – not an april fool’s joke

In one of my previous posts (titled: It’s official: Western Digital hates me and I hate them too) I described how 3 Western Digital drives I owned crashed in 2 months.

Last week I left the city I live to go to Athens, Greece where fosscomm was taking place. When I returned I checked the logs of my machines and in one of them I found this:


hdi: lost interrupt
hdi: status error: status=0x51 { DriveReady SeekComplete Error }
hdi: status error: error=0x04 { DriveStatusError }
ide: failed opcode was: unknown
hdi: no DRQ after issuing MULTWRITE_EXT
hdi: status error: status=0x51 { DriveReady SeekComplete Error }
hdi: status error: error=0x04 { DriveStatusError }
ide: failed opcode was: unknown
hdi: no DRQ after issuing MULTWRITE_EXT
hdi: status error: status=0x51 { DriveReady SeekComplete Error }
hdi: status error: error=0x04 { DriveStatusError }
ide: failed opcode was: unknown
hdi: no DRQ after issuing MULTWRITE_EXT
hdi: status error: status=0x51 { DriveReady SeekComplete Error }
hdi: status error: error=0x04 { DriveStatusError }
ide: failed opcode was: unknown
pdc202xx_new: Primary channel reset.
hdi: no DRQ after issuing MULTWRITE_EXT
ide4: reset: success
hdi: dma_timer_expiry: dma status == 0x21
hdi: DMA timeout error
hdi: dma timeout error: status=0x80 { Busy }
ide: failed opcode was: unknown
hdi: DMA disabled
pdc202xx_new: Primary channel reset.
ide4: reset: success
hdi: lost interrupt
md: super_written gets error=-5, uptodate=0
raid5: Disk failure on hdi1, disabling device. Operation continuing on 5 devices

This the fourth crashed WD drive in 2 months! It’s not an april fool’s joke.. it’s still 31st of March..


Model Family: Western Digital Caviar SE family
Device Model: WDC WD2000JB-55GVA0
Serial Number: WD-WCALL1025118

Of course it’s out of warranty. Again.

As Fuzz said, this whole thing must be a logic timebomb planted inside WD disks years ago to force us move to SSD drives.

I’m getting pretty tired of it though…

My current desktop

Since sotiris asked, here’s a recent desktop screenshot.

Ok it’s not so recent (12/Nov/2007)…but it hasn’t changed at all since then 😛

It’s Fluxbox with ROX Desktop and of course it’s Gentoo! 😀

Interested in what comzeradd, agorf and Charmed[] have for desktop ?

It’s official: Western Digital hates me and I hate them too

About a month ago one of the hard disks in my PC started showing DMA errors on syslog. It was a Western Digital WD1200JB with manufacture date: 13 MAR 2002. Luckily on that disk I only kept temporary data like downloads, some music and videos, and some pretty old backups. As soon as I saw the DMA errors on syslog I placed a spare 200Gb drive on the box and tried to rsync all data to it. I saved most of the needed data but I lost some of my old backups. The case is that I didn’t really know what was inside them, there were some directories named like: “/Backups/OLD/foobar/backup_older/random_crap”. I guess it was crap after all. I never needed anything from inside that directories for at least the last couple years.

2 weeks ago I returned from a trip to Athens. I checked my mails where I get reports from ossec on various servers I manage. One of these mails reported that a RAID5 array with 6x200Gb disks was degraded due to a hard disk failure. Yes, it was a Western Digital, again. Model Number:WD2000JB, manufacture date: 26 AUG 2004. I had another 200Gb drive at home where I keep my backups. Since I couldn’t afford the risk of not having a spare disk for my home backups, I bought a Seagate ST3500320AS. Since the new disk was 500Gb I copied all my data from the “spare” 200Gb disk and also made a full backup of my boot disk which is 120Gb. I then replaced the faulty 200Gb on the server with the “spare” 200Gb drive I had at home.

On Thursday I came back from an one-week trip, this time to my hometown. All was fine until Friday noon. Then I tried to open a text file inside my home dir (which is a seperate partition on my boot disk) that I keep some random notes and the machine started crawling. I couldn’t open the file. I tried to copy the file to another disk without success. I only got some beautiful I/O errors on the terminal and DMA errors on the syslog. Guess what! The disk was a Western Digital 1200JB with manufacture date: 14 DEC 2001. Under different circumstances I would cry at my bad luck…but the only thing I could do was laugh. I couldn’t stop laughing about this mess. I placed the 500Gb Seagate on a external USB case and started to rsync the root dir on top of my 2 weeks ago rsync. A couple of files couldn’t be read from the boot disk but they were already on the “backup” so I saved everything. Since I had no spare disk left at home I went out and bought another hard disk. I couldn’t find any 250 or 320Gb Seagate drives so I bought another 500Gb Seagate ST3500320AS. What was funny was that the salesman at the local store tried to convince me to buy a Western Digital 320Gb without success of course, I wonder why…
I placed the new 500Gb disk in my box, booted iloog, partitioned the disk and rsync-ed my data from the “old” 500Gb disk to the new.

YES, I am using smartctl/smartd on all of my boxes even at home. Smartctl was not showing ANY errors at all before the first DMA errors appeared on syslog. I am regularly testing all my disks with smartctl’s tests: short, long and conveyance (where it’s supported)

The first disk is in complete unusable form right now. I tried partitioning it and formatting it but it moans painfully when it is accessed. It currently shows more than 100 S.M.A.R.T. errors. It’s dead.
The second one has about 4-5 S.M.A.R.T. errors logged. It doesn’t make any strange noises when operating but I haven’t extensively tested it yet. It surely cannot be trusted…
The third disk has bad sectors and about 20 S.M.A.R.T. errors. Most of them were “created” during the check for bad blocks process and every time a bad arrea is accessed more errors are added to the log. During operation it makes an annoying sound which is like scratching metal parts against each other.
Funny thing is what smartcl reports for all disks, even for the first one:

SMART overall-health self-assessment test result: PASSED

I am well aware that all disks were over their guarantee (3 years), that’s why I was keeping backups (of important stuff) over separate disks, but I don’t think I’ll be buying any Western Digital drives in the near future…I need some time to get over this month of crashes…

Any other Western Digital haters out there ?

How to standardize an error

All software companies make errors.

A great deal of those companies correct these errors as soon as someone finds them.
A few companies correct them as soon as they can. But that can sometimes take months.
One company not only does it not correct the errors it makes, it tries to standardize them.

It’s not about how big or small an error is…it’s about the attitude.