Portal Home > Knowledgebase > Articles Database > Netriplex/Uberbandwidth Asheville, NC power outage
Netriplex/Uberbandwidth Asheville, NC power outage
Posted by pubcrawler, 02-24-2011, 03:20 PM |
Netriplex/Uberbandwidth in Asheville, North Carolina, had a power outage effecting some lucky customers this morning.
Looks like 20 minutes approximately - 9:50AM-10:10AM.
All seems back to normal.
Dealing with joyous experience of rebuilding MySQL tables crashed in that process.
|
Posted by FastServ, 02-24-2011, 03:29 PM |
Unless you crashed extremely hard (filesystem damage, ect), this might help speed up your recovery:
http://djlab.com/2009/06/mysql-how-t...all-databases/
|
Posted by pubcrawler, 02-24-2011, 03:59 PM |
Thanks for the link FastServ.
Good MySQL command there... running it across our many and large tables... slow, but easier I think than how we were doing the checks and repairs.
|
Posted by TQ Mark, 02-26-2011, 08:13 PM |
What was the nature and extent of the outage? We have servers there and didn't notice any issue.
|
Posted by pubcrawler, 02-26-2011, 08:40 PM |
There was an email circulated about the power outage to some customers --- saw it online elsewhere. It said:
"Please be advised that the Netriplex AVL01 data center facility team has reported that the UPS on Power Bus 1 momentarily dropped the critical load impacting some customers in our AVL01 datacenter. Our records indicate that some or all of your infrastructure is being powered by this Bus. This power Bus currently serves approximately one fifth of the customers in our AVL01 data center.
Facilities management is currently investigating this issue and has called our UPS service & support vendor to come onsite to investigate this issue with our electrical team. Further information is not known at this time...."
Unsure if there was a follow up email as Netriplex has been rather poor about coming clean and wrap up post issue.
(I like the hands on staff @ Netriplex, but have lost patience with their monthly issues)
My opinion of them as a top tier provider is that's bogus, no matter how much press and PR they push claiming otherwise. This was another outage during a non maintenance window --- actually during East Coast prime time (9-10AM). Unacceptable.
For the record, I never received an email about the outage, even though my server there was effected (and I am known to them as per other posts and a small pile of tickets).
I actually am within someone else's rack and they were totally unaware of the outage (unsure why).
Just too much trouble at Uberbandwidth now for me to run a business out of their facility. Shame, because the rate there can be affordable and the facility has lots of promise.
Unsure where in the leadership stack the problem is. But if it were my company I'd be looking for someone to own up to the issues and figure out how to stop having them at this frequent pace or be shown the door.
|
Posted by Rogean, 02-27-2011, 03:17 AM |
They sent 4 emails, this was the last one.
|
Posted by pubcrawler, 02-27-2011, 03:44 AM |
Thanks Rogean.
Well this power outage was totally preventable = engineering deficiency.
Cells go bad *all the time* in today's batteries. Tends not to totally kill a battery, typically. Nor does a single battery typically drag down your power.
This facility, UberCenter/UberBandwidth isn't that old to have failing batteries.
This may be part of the problem:
"Uber Center exercises its generators biweekly and for 1-2 hours per exercise at full load in real-time tests using a 50+ item checklist. Yes, it costs us more time and money, but it provides our customers with 100% uptime."
If they are doing that and testing it by also using battery power, then they are shocking batteries 1-2 hours every week with draw down and rapid recharging.
" but it provides our customers with 100% uptime. And that’s what the Uber Center is known for."
You can read the rest from their blog boasting, here:
http://blog.uberbandwidth.com/?p=17
Shame, cause Uber looks good on paper.
|
Posted by dotHostel, 02-27-2011, 05:21 AM |
They exercise the generators, not the UPS.
|
Posted by pubcrawler, 02-27-2011, 05:38 AM |
Nothing would surprise me with their facility at this point
Fascinating that a 50 point comprehensive 1-2 hour generator test wouldn't be testing transfer switches and actual load, including the UPS systems.
So they crank the gen sets up and set the idle at 80% operation setting? Doesn't seem like much of a test really, certainly not an actual test of the entire power system.
They should have been doing comprehensive monitoring of their UPS systems and would have detected unequal cells and been able to spot replace one off batteries.
Batteries are way far more likely to fail prematurely than an industrial diesel generator. There are 80-100 year old diesel generators that have been running 24/7 out there making power in communities all over the world.
|
Posted by dotHostel, 02-27-2011, 05:44 AM |
The issues with data center generators usually happen because they are not running 24x7
|
Posted by pubcrawler, 02-27-2011, 05:53 AM |
Yes, stop and start cycles *can* be hard on diesels, when not properly maintained. There are million mile driven diesels out there - some with well over 500k miles between oil changes (tractor and tractor trailer).
Diesels can be hard to get started but rather idiot proof to maintain.
Someone needs to incorporate power generation for a datacenter within a datacenter compound. There are universities, hospitals, etc. with their own generation facilities.
Uber makes me sad. They have the potential and some smart folks. Just way too much preventable breakage. Someone take 10 of those generator test steps and apply them to the UPS systems.
|
Posted by dotHostel, 02-27-2011, 06:17 AM |
Not that simple. Data centers must spend a lot of money with diesel and preventive maintenance just to keep the generators ready.
There are some initiatives in this direction as the Syracuse University data center. http://www.syr.edu/greendatacenter/GDC_facts.pdf
Last edited by dotHostel; 02-27-2011 at 06:31 AM.
|
Posted by dotHostel, 02-27-2011, 06:48 AM |
OFF-TOPIC - Interesting read:
http://www.debugamericalatina.com/ba...in-diesel.html
and
http://www.dieselcraft.co.uk/test_kits.htm
Last edited by dotHostel; 02-27-2011 at 07:03 AM.
|
Posted by pubcrawler, 02-27-2011, 02:24 PM |
Diesel isn't as big of a problem as many of the reports out there, inclusive of the above indicators.
While all posted is indeed true, the problems vary greatly with diesel. The issues with fuel are varied based on:
1. Low sulphur seasonal blend (more prone to oil going bad faster)
2. Storage temperature of the fuel
3. Tank construction materials and interaction with the diesel
4. Water drain off and tank cleaning maintenance
I routinely fire up diesel powered gear on 5+ year old fuel without any problem (aside from say a battery being bad that starts such unit).
Bringing 24/7 gen sets to the datacenter is next obvious step. Already have infrastructure in place for notably natural gas generation. On site generation should eliminate the need for massive and costly UPS units that are a very weak link in the overall operations. Smaller higher quality UPS design could be implemented or perhaps almost avoided with redundant gen sets. Storage mechanisms like ultracapacitors are slowly eroding some use of batteries and are going to show up in datacenters soon I suspect (more widely).
Good to see the shared piece from Syracuse University. That sort of implementation is where the industry leaders are headed --- especially in smaller tier markets and where nearby space is plentiful.
|
Posted by TQ Mark, 02-27-2011, 03:17 PM |
pubcrawler, have you considered getting A+B power feeds? There is a reason why most datacenters offer it.
|
Posted by pubcrawler, 02-27-2011, 03:26 PM |
A+B power feeds sounds reasonable, but I've always thought of it as being a feature for failure at and on the server level (i.e. bad power supply)--- not as a workaround for bad power from a provider.
It certainly would be a prudent decision (A+B power), but considerably more costly (higher end servers with such dual PSUs, typically 2U and above server size - more space rental for less density, additional power cost, another long term monitoring issue).
Is anyone aware of say a 1U A+B feed power aggregation distribution unit? Bring in A+B power to PDU then just single power cable out to existing servers? Obviously, the idea there is to work around facility power issues and not the occasional server level PSU failure.
|
Posted by Dougy, 02-27-2011, 03:27 PM |
Even though they always break down, I love our Ford E-450's in the ambulances.. diesel powahhhhhhh
For what it is worth, dirty diesel is a bad excuse. Dupont Fabros here in NJ circulates their fuel through some filtration setup every week to make sure their fuel is nice and clean.
|
Posted by Ed-Freethought, 02-27-2011, 06:14 PM |
Any decent provider should be able to give you A+B feeds that are at least diverse at the distribution level, if not also a UPS level.
If you want to feed a server with a single PSU off dual feeds then APC have a range of 1U ATS boxes for various voltage types that can take diverse feeds and then switch over between them fast enough that it won't drop the critical load if there is a problem on one of the feeds: http://www.apc.com/products/family/i...CountryCode=us
|
Posted by dotHostel, 02-27-2011, 06:23 PM |
Bad excuse for what?
|
Posted by pubcrawler, 02-27-2011, 07:45 PM |
@freethought
Thanks for the link to the APC transfer switches. I *figured* such a thing existed, just had never used them before or had customer requirements for such a device.
Sounds like a solution, especially when we roll more gear out in a single rack.
Wondering what sort of additional Amp draw these devices will add up to (if anything significant). Obviously, there is second drop cost on monthly basis.
|
Posted by Ed-Freethought, 02-27-2011, 07:54 PM |
No problem, happy to help
I don't have an data on how reliable these things are, as you are introducing a potential sigle point of failure onto either path. APC probably have a whitepaper on it though.
|
Posted by pubcrawler, 02-27-2011, 08:06 PM |
@freethought,
Cost prohibitive typically to eliminate every point of failure. This is a good bandaid though for unreliable power. *VERY APPRECIATED*
This is second time in less than a year that we've been bitten by random power drop in two different data centers
Big deal for us is that MySQL isn't happy about being dumped in a power outage. It requires checks and rebuilds and we have some huge tables that take more time than I have to wait.
We run multiple locations in a hot-hot mode, but that represents issues also with data synchronization. Means we have to continue engineering ever more complex replication, checksums and other hacks to deal with such an outage. Nice to be able to pull such a thing off, but it's very suspect to failure.
Been a long year since we rolled out to multiple colocation facilities. Based on experiences so far, we preferred to have servers at our office. Had far better uptime at this point compared to remote colocation facilities.
Guess it seems to me that there are more failures each year industry wide.
The industry needs some independent uptime auditing with public reporting. Really would sort out good facilities from mediocre ones and show a pricing correlation perhaps (i.e. define better what you get for dollars spent).
|
Posted by Dougy, 02-27-2011, 10:12 PM |
It was mentioned before that sludge buildup in diesel can cause issues.
|
Posted by Henrik, 03-01-2011, 03:10 PM |
It should be more the engines themselves then. Diesel don't go sludgy that easily. You have issues with condensed water and such too. However that's a separate issue.
(Talking from "farm experience" here )
|
Posted by ChrisGragtmans, 03-03-2011, 02:07 PM |
Hello WHT Community,
Although we usually try to maintain our focus on providing the best possible service to our customers, and avoid entering into discussions on online forums, I believe that it is appropriate to respond to this thread. Like any datacenter in the industry, our business is a work in progress, and we are constantly striving to better ourselves and avoid events such as the subject of this thread. We are working with each of our customers on an individual basis to make sure that any issues that arose as a result of this event are rectified.
The tone of this thread is rather unfortunate, because we match and work to exceed what the best in our industry do. We are a SAS 70 Type II certified facility, and all critical systems have third party service contracts with minimum semi-annual preventative maintenance. I wont speak to the specific criticisms outlined above, but I would like to request that current clients speak with us directly rather than immediately jumping to public forums. We do everything in our power to respect our clients privacy, and if you reference the confidentiality disclaimer at the bottom of Netriplex emails, youll see that we ask the same of you.
Rather than speculating online, Id like to ask you to please contact us with your questions. The information is here, and we would be happy to share it with you. Thank you all for your business, and we will continue to take every step possible to be the best in the industry.
Chris Gragtmans
Interactive Marketing Manager, Netriplex
|
Posted by pubcrawler, 03-03-2011, 02:32 PM |
Welcome to the community Chris.
I've posted most of this and other commentary on the various outages and other snafus at your facility in recent times.
It's important that customers and potential customers know how a facility operates and what the issues are ahead of time or in case of an outage, what is going on. It's great cost and aggravation to move to a facility and have many issues and inevitably the cost of massive redundancy workarounds and eventually the cost of moving gear again to another facility.
Your company hasn't been forthcoming about problems (in Asheville) and in case of outages, less than responsive (like when your network a few months ago was blackholed and totally offline, inclusive of your own website). The reference to the confidentiality disclaimer is a tad chilling and I have some real questions about why you mentioned that.
Please read my posts on Netriplex on here and let's work on discussing the varied matters offline in email. I have another matter also that you should be aware of that I haven't posted publicly.
|
Posted by xbxbxc, 03-03-2011, 06:23 PM |
I can't believe a provider would insinuate that his clients shouldn't express their feelings about service levels on this forum. That is very threatening of a remark to make and I would have to consider removing my equipment from there.
I am referring to "We do everything in our power to respect our clients’ privacy, and if you reference the confidentiality disclaimer at the bottom of Netriplex emails, you’ll see that we ask the same of you." posted above by ChrisGragtmans a relatively new member with next to no posts.
|
Posted by andrewipv4, 03-14-2011, 02:38 AM |
You missed the part where they said it was a shorted cell, not just that it was bad. If it were bad, the whole string would drop a couple of volts - not a life or death issue. If it shorts, it could very feasibly break the entire link, as UPS batteries are typically series'd together to form 400 to 500 volts of DC. I understand that you're ready to throw them under the bus, but just know that this failure scenario as depicted is quite possible.
|
Posted by andrewipv4, 03-14-2011, 02:44 AM |
Do you have any threads or online resources that reference this blackholing?
I agree. I think every company would prefer to keep outages completely quiet and out of public view. But to chastise a customer for doing so is quite absurd.
That's pretty easy to type, but much more difficult to do. It's a rather bold claim, and since you said so publicly, perhaps you'd care to back that up by publicly outlining your testing procedure for UPS battery strings, if any.
|
Add to Favourites Print this Article
Also Read