ExchangeDefender Redundancy: Technical Implementation Details
In August our core data center in Dallas Texas suffered its worst outage in the past 8 years after an ATS power failure. Since the power outage was due to the ATS, backup power was not able to be routed to the redundant feeds to all our servers and as a result critical services were knocked offline. ExchangeDefender’s core focus is on-time clean mail delivery, and inbound mail delivery speed was impacted but was still available and processing mail for all customers. One of the most beneficial features of ExchangeDefender is LiveArchive which acts as a continuity solution for ExchangeDefender clients by keeping an additional copy of all client inbound and outbound mail. During the outage, LiveArchive was knocked completely offline for the first time since release. The full scale outage of LiveArchive prevented clients on Hosted Exchange and ExchangeDefender from being able to reach their continuity solution, creating lots of issues for partners and their clients. The outage caused us to reevaluate the redundancy in our solutions across the board starting with additional ExchangeDefender nodes and implementation of LiveArchive in our Los Angeles Data Center.
Today, I would like to update you on what we’ve done and announce the immediate availability of the new solution that will work even if one of our data centers is offline completely for an extended period of time.
Complexity & Size Problems
The biggest complexity of the additional LiveArchive network was accepting mail between multiple datacenters and routing it to the nearest network, and finally synchronizing the mail between with remote networks. Due to amount of mail processed by ExchangeDefender on a daily basis, we were prevented from using the built in redundancy in Exchange 2010 like Database Availability Groups and Microsoft software load balancers for OWA and SMTP.
On average our Dallas LiveArchive network processes and stores over 115 million messages a day. By adding redundant LiveArchive networks we will automatically double our message processing count on our Exchange transport servers due to synchronization between networks. The additional overhead would also create immense load on our network if we utilized Database Availability Groups as our Dallas network is already storing over 382 TB of data. The notion of having a desynchronized network or increasing LiveArchive mail delivery time was not a viable solution as LiveArchive would then fail as a continuity solution.
The Solution
To overcome the obstacles and limitations we faced, we started to develop custom solutions to tie in with our LiveArchive networks and redesigned our network layout and made minor changes to the LiveArchive storage length in Los Angeles to one month.
The first issue to tackle was the Active Directory layout as we could not tie Active Directory into our already established and quite frankly rock solid Dallas Active Directory network. In the event of another catastrophic failure in Dallas the inability to contact the primary Active Directory network would render the Los Angeles copy useless unless we create them as unique sites. Although this was possible, we felt with the power of our automation between ExchangeDefender and LiveArchive was better suited for the job as the above solution would cause too many changes to occur during a failure, including many DNS changes. We decided to have unique Active Directory domains between sites and we hooked user account creations for Los Angeles into our ExchangeDefender provisioning.
The second issue was mail delivery design between sites. The original design of LiveArchive mail routing would cause the LiveArchive copy of inbound mail to always be routed to Dallas for delivery to LiveArchive. The Achilles heel of the design was always in mail that was delivered to Data Centers outside of Dallas as it would cause the original mail delivery to be delayed a few seconds as a copy was dispatched to Dallas. Up until a few months ago our original design was acceptable and was only cause minor delays about twice a year. As the amount of mail processed increased, the delays became more noticeable. Mail delivered to networks outside of Dallas began to see increased processing time and eventually took twice as long compared to mail that was accepted by Dallas. By adding an additional LiveArchive network in Los Angeles we were able to route mail that arrived in Los Angeles to our Los Angeles LiveArchive network.
The final issue was mail synchronization between sites. We took advantage of the powerful extensions into Exchange by creating a custom Transport Agent that would copy submitted messages from each ‘local’ LiveArchive message to the remote network for processing. By utilizing custom routing and Edge servers we were able to successfully copy mail and provide real time processing and delivery between both LiveArchive networks.
In the event of a failure in either site, ExchangeDefender mail will automatically fail over to an additional DC and all LiveArchive mail will route to the network with available heartbeats. Upon creating an outage notification, our team will modify the DNS records for https://livearchive.exchangedefender.com to route to Los Angeles. To access the Los Angeles LiveArchive network directly users can login at https://la.livearchive.exchangedefender.com
Bookmarks:
LiveArchive: https://livearchive.exchangedefender.com
Los Angeles LiveArchive Cluster: https://la.livearchive.exchangedefender.com
Please make sure your clients are
Sincerely,
Travis Sheldon
VP Network Operations, ExchangeDefender
(877) 546-0316 x757
travis@ownwebnow.com