Portal Home > Knowledgebase > Articles Database > did I just kill my server witht this command ?!?
did I just kill my server witht this command ?!?
Posted by papi, 11-15-2010, 08:15 AM |
Ok I'm running centos 5 x64 and doing some testing on new servers that just had all software installed.
I was reading a topic somewhere and someone mentioned this command that they used to test the speed of their hdd raid:
dd if=/dev/zero of=/dev/sda bs=1M count=1000
So I thought that cant do no harm if someone else is running it so I ran it (except I did it with count=10000 because with 1000 it finished too quick as its a 15k SAS RAID10)
Anyway, so a little while later I goto the dc and connect to this server to run some tests, one of them being "arcconf getconfig 1" to retrieve details about the raid...
It doesnt react ... so I run it again .. nothing. Then suddenly it says can't open some log file in /var/log and still doesn't run (I was able to open the same file myself using pico without problems)
So I goto cli and "shutdown -r now" and BOOOM tons of errors all over the screen incl. ext3 related stuff. After a few mninutes of pages and pages of errors, I power off/on the box and it comes back. I enter the adaptec bios and it shows the array as "Optimal" but then when I goto boot ... nothing, doesnt recognize the boot media.
So I run linux rescue from the centos cd and it tells me that that partition table is UNREADABLE and that the only thing it can do is re-initialize the drives causing loss of all data.
Apologies for the long description I just wanted to be detailed.
Anyway, so here I am ..reinstalling centos on this box. Now I want to know if the command that I ran above would/could have caused damage to the partition table or if its more likely to be a hardware (raid controller) problem ????
Scary thing is I ran the SAME command on 5 other IDENTICAL servers and they all seem ok (so far) but I havent tried rebooting them yet
|
Posted by papi, 11-15-2010, 08:27 AM |
yep looks like I've killed them sll ... just rebooted snother one snd ssme issue, partition table ****ed.
Wow, what A DOUCHE. I feel like such a spastic now.
You'd think after 10 years in the biz I'd know better than to simply run a command like that without first making 100% sure it can do damage.
Oh well 2 days of work lost. But I'm happy believe it or not. It's just software and I can reinstall that easily since these boxes are not live yet... I was worried it was the RAID controller that ****ed up (due to some previous issues with it in relation to its BIOS). Happy to know the raid controller didnt do anything wrong
|
Posted by drspliff, 11-15-2010, 08:57 AM |
Oh this is too funny.
I guess we live and learn Care to name & shame where this 'advice' came from?
|
Posted by afmatt, 11-15-2010, 09:06 AM |
Ouch.... Sounds like one of those days. Good thing it's nothing serious other than the software
|
Posted by papi, 11-15-2010, 09:09 AM |
wasn't advice really ..just someone mentioning they used that to test the speed of their HDDs (think it was one of those "vibrating fans killed my hdd" threads
but yeah ... its friggin insane and funny at the same time. But I am REEEALLL happy it wasnt the controller that did something as my initial assumption was and I was already starting to consider which controller to replace it with and how much of a huge hassle that is going to be
as it stands its just a bunch of os/cpanel reinstalls ... pain in the arse but I AM in the process of stress testing these boxes before they go live at the end of the month, so this is kinda giving them a bit of a workout.
|
Posted by bizness, 11-15-2010, 10:24 AM |
really...
hdparm for the win...
i use dd responsible though....
sorry to hear that you killed your box....
i hope u were able to sleuth out the data.
|
Posted by aeris, 11-15-2010, 02:34 PM |
Heh, well.. you could indeed use that command to test the sequential write speed of a RAID. But it would be a destructive test, essentially zero-filling the first 10GB of the disk.
|
Posted by Patrick, 11-15-2010, 02:54 PM |
Ugh. Sounds like someone trolled you... next time, if you're not 100% sure what the command does, don't run it as root. :]
|
Posted by trustedurl.com, 11-15-2010, 03:14 PM |
I really hope those are not production servers; you overwrite the first 10GB of raw data... those 5 servers should be toast as well. sorry
if = input
of = output
bs = blocksize
count = count
|
Posted by TDS-chriss, 11-15-2010, 03:50 PM |
On the off change you're still keen on running the test, replace the original of= with of=foofile and you should be good to go. Remember to remove the foofile when you're done.
|
Posted by Ronald_Craft, 11-15-2010, 05:32 PM |
That is pretty amusing. Lesson learned. When I was a newbie I once did yum uninstall libstdc++. That caused one helluva night.
|
Posted by Joe262, 11-15-2010, 07:14 PM |
Nice.
Hope proper backups were kept ; )
|
Posted by papi, 11-15-2010, 08:19 PM |
no backups, these were not live boxes!
I'm surprised no one asked yet what ths epeed result was anyway ... 770 megajigabytespersec!! Haha
|
Posted by Joe262, 11-15-2010, 10:36 PM |
Nah the zeroing over of your disk is far more amusing than any speed result.
|
Posted by CNSERVERS, 11-15-2010, 11:27 PM |
that test uses buffer, so it's not accurate at all.
if you want to test the real disk speed, you should use dd with oflag=dsync.
so you basically killed your OS by running a not accurate speed test
|
Posted by papi, 11-16-2010, 04:17 AM |
Damnit man!
So um would this be perfectly safe to run ?
dd if=/dev/zero of=tempfile bs=1M count=10000 oflag=dsync
|
Posted by quad3datwork, 11-16-2010, 02:20 PM |
Yes, above is safe. Just make sure your PWD is where you want to be and have more than 10GB free.
|
Posted by TDS-chriss, 11-16-2010, 02:23 PM |
Yup.
Last edited by TDS-chriss; 11-16-2010 at 02:27 PM.
Reason: added file listing and removal
|
Posted by houkouonchi, 11-17-2010, 02:17 AM |
I usually use direct I/O as it gives the lowest CPU usage and bypasses the OS/s buffers. I have seen DD get bottlenecked by CPU usage if not using direct before:
Writes:
root@dekabutsu: 10:16 PM :/data# dd bs=1M count=5000 oflag=direct if=/dev/zero of=./5gb.bin
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 7.20372 s, 728 MB/s
Reads:
root@dekabutsu: 10:16 PM :/data# dd bs=1M iflag=direct if=./5gb.bin of=/dev/null
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 6.41051 s, 818 MB/s
|
Posted by mugo, 11-17-2010, 03:19 AM |
Yup, dd can kill. I only use it in dev environments to test I/O, etc. I've never had the guts to run it on a good HDD on a production server. I have on a PFed HDD, but hey, it was dying anyway...
You live, you learn. That's why God invented backups.
|
Posted by papi, 11-17-2010, 07:25 AM |
Hmm with same commands I'm getting 280 MB/s write and 300MB/s read. This is 4 x 15k7 Cheetah SAS in RAID10 (Adaptec 5405 / default stripe size of 256Kb)
How are you getting so much faster speeds .. what's your setup?
|
Posted by houkouonchi, 11-17-2010, 10:31 AM |
I would expect the read speeds to be a bit better than that as with raid10 ur read speeds should be about double the write speeds unless the controller is not reading from all disks (I know some that don't but I thought adaptec did). This is on 20x2 TB hitachi disks (7200 RPM) in raid6 on an ARC-1280ML controller. Its a pretty dated controller. The new ARC-1880 gets about 1.4 gigabyes/sec read and 900 megabytes/sec write with the same amount of drives in raid6.
|
Posted by papi, 11-17-2010, 11:08 AM |
ah ok 5 times as many spindles ..sweet, makes sense now
Anyway, just out of curiosity I upped the size of the blocks to 10M and got these crazy speeds .. hmm ?
/home]# dd bs=10M if=/dev/zero of=tmpfile count=100 iflag=dsync
100+0 records in
100+0 records out
1048576000 bytes (1.0 GB) copied, 1.00776 seconds, 1.0 GB/s
Woo fast but ok probably not big enough..so here we go again
[/home]# dd bs=10M if=/dev/zero of=tmpfile2 count=1000 iflag=dsync
1000+0 records in
1000+0 records out
10485760000 bytes (10 GB) copied, 20.9969 seconds, 499 MB/s
[/home]# dd bs=10M if=tmpfile2 of=/dev/null count=1000 iflag=dsync
1000+0 records in
1000+0 records out
10485760000 bytes (10 GB) copied, 2.6184 seconds, 4.0 GB/s
that looks kinda crazy fast but I'm sure it's just something to do with the much larger block size (duhh). Mind doing that on yours?
|
Posted by houkouonchi, 11-17-2010, 02:17 PM |
How much ram does ur system have? This is likely the OS cache. I doubt you will ever get results like that if you use iflag/oflag direct instead of dsync. If you do very small data (<512 MB) then u will get really high speeds as it will just be hitting the raid controllers cache.
|
Posted by ZenMonk, 11-18-2010, 03:23 AM |
The i/o rates does not look pretty for us. Maybe because its running a RAID6.
|
Posted by johnmarsh110, 11-18-2010, 05:18 AM |
yes i understand thanks for sharing this
|
Posted by houkouonchi, 11-18-2010, 11:48 AM |
I was aslso running raid6. It depends a lot on what controller, file-system and how many drives you are running off. What controller/drives/fs are you using?
|
Posted by ZenMonk, 11-19-2010, 03:26 AM |
Running Seagate 500 GB disk in raid6 using Adaptec 5805Z raid card, striped to 1 TB. Filsystem is ext3. What about you?
|
Posted by houkouonchi, 11-19-2010, 06:55 AM |
Ok from that I assume you only have 4 disks in raid6? 4x500 GB for 1TB useable? If that is the case that is why as for sequential reads/writes you will only get the speed of n-2 disks (n being ur total disks) so that read/write seems about right for only two disks worth of I/O.
I am 20x2TB hitachi 7200 RPM disks, ARC1280ML raid controller, JFS file-system. Also ext3 isn't known for its speed.
|
Posted by ZenMonk, 11-20-2010, 02:58 AM |
Yup 4 drives for 1TB useable. 20x2TB is a beast what is its purpose? backups?
|
Posted by houkouonchi, 11-20-2010, 01:34 PM |
Its actually my personal machine in the rack at my house. I also have a 20x1TB backup machine as well as a couple other ones. I am a data packrat, I never delete anything I download.
|
Posted by CoderJosh, 11-20-2010, 11:29 PM |
I just hope nobody will read your post here, just copy and paste the command without reading what happened.
|
Posted by hkrental, 11-24-2010, 07:06 AM |
I suggest you test your harddisk before booting the OS (e.g. Salvation, etc.)
|
Add to Favourites Print this Article
Also Read