Exchange 2013 Datacenter Failover and Disaster Recovery


Although we have so many article over the internet for the datacenter failover and site resilency thought to summarize all of them in short note what is need on failover period instead reading 2 to 3 hours on getting the concept what we need.

Exchange 2013 Terminology

Few terminology should be know by Exchange Administrator regarding their environment:

Primary Active Manager which runs inside the Microsoft Exchange Replication Service used to notify and react in case of server failure. The PAM owns the cluster quorum resource and holds the information about active, passive and mounted databases.

Standby Active Manager provides information of the server hosting the active copy of a mailbox database to the Client Access or Transport services.

Datacenter Activation Coordination uses a protocol called Datacenter Activation Coordination Protocol (DACP) to avoid split brain .When a DAG is running in DAC mode, When the server reboots, the Active Manager starts up the bit as 0 (Database Dismount state). It communicates with other members in the DAG when it responds the bit set to 1 and allowed to mount database

Quorum Details

Odd number of nodes —>  Node Majority

Even number of nodes (but not a multi-site cluster) —>  Node and Disk Majority

Even number of nodes, multi-site cluster —>  Node and File Share Majority

Even number of nodes, no shared storage           —> Node and File Share Majority

Continous replication uses initial File Mode to replicate 1 MB of file to the passive database. When File Mode completes it moves to Block Mode for imediate updates

Port 3343 is used Nodes for listening incoming connections from other nodes of the DAG Members

I believe it more enough to know the definition let us move pratically what we do in our Exchange infra. It’s always good to have documentaion of the below component information which will helps in case if our servers are in disaster.

Verification of Exchange 2013 DAG Components:

  •  Primary Active Manager:
  •  To verify PAM

          Get-DatabaseAvailabilityGroup <DAG NAme> -status |fl Name, PrimaryActiveManager

          To move PAM on different DAG Member

          Cluster group  “Cluster Group” /MoveTo:<DAG Server Name>

  • AutoDatabaseMountDial:

 Get-Mailboxserver <MailboxServerName> | FL Name, AutoDatabaseMountDial                      

BestAvailability (default) – Copy queue length of ≤12 Logs count
GoodAvailability – Copy queue length ≤6 Logs count.
Lossless – Copy queue length Zero Log Count                                 

  •  Datacenter Activation Coordination (DAC)

            Get-DatabaseAvailablityGroup –Identity <DAGName> | FL Name, DataCenterActivationModel

  •  To verify Quorum

            cluster /quorum

  •  To verify Continuous Replication Mode

             Get-Counter -ComputerName <> -Counter “\MSExchange Replication(*)\Continuous replication – block mode Active”

  • To check replication network

              Get-MailboxDatabaseCopyStatus -Server <Severname> -ConnectionStatus | FL Name, Incominglogcopyingnetwork, Seedingnetwork

  • To Check DagNetworkConfiguration

            Get-DatabaseAvailabilityGroup | FL Name, ManualDagNetworkConfiguration

  • Check the Exchange server location in AD site

            Get-ExchangeServer –Identity <server_name> -Status | FL

 Exchange 2013 Datacenter SwitchOver

When the primary site fails due to disaster on the odd nodes due to power Outage or server failure follow the below steps

  • Verify the Started Server and Stopped servers in the DAG

           Get-DatabaseAvailabilityGroup <DAGName>  -Status | FL Name, *Servers

  • Use the Stop-DatabaseAvailabilityGroup to mark the primary site DAG members are in failed state.

           Stop-DatabaseAvailabilityGroup –Identity <DAGName> -ActiveDirectorySite PrimarySite 

  •  Verify the Started Server and Stopped servers in the DAG

          Get-DatabaseAvailabilityGroup <DAGName>  -Status | FL Name, *Servers

  • Stop the cluster service in all the passive node of the secondary site

          Stop-service clussvc 

  • Use the Restore-DatabaseAvailablityGroup to remove the stoppedmailbox server from the DAG and re-establish the quorum using the alternate Witness server

          Restore-DatabaseAvailabilityGroup <DAGName> -Activedirectorysite DR

  •  When the service or power is restored in the Primary site is up run Start-DatabaseAvailabilityGroup to revert the datacenter

          Start-DatabaseAvailabilityGroup <DAGName> -ActiveDirectorySite ProductionSite

  • Check out the Quorum model

          Get-ClusterQuorum | fl

  • Still if it’s show the older quorum model execute the below powershell cmdlet

           Set-DatabaseAvailabilityGroup -Identity DAG01

Advertisements

About Raji Subramanian

Nothing great to say about me...Just want to share my knowledge for others that will be useful at any moment of time when they stuck in critical issue....
This entry was posted in Exchange 2013 SP1, Exchange Server 2013 and tagged . Bookmark the permalink.

3 Responses to Exchange 2013 Datacenter Failover and Disaster Recovery

  1. Kman says:

    Hi.
    Last line is “Set-DatabaseAvailabilityGroup -Identity DAG01”

  2. Arun says:

    Nice practical way to explain the SwitchOver procedure. Very helpful. Thanks a lot Raji!!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s