13 October, 2010

BC6703 - How to be successfull with SRM implementation

Session by Michael White, VMWare specialist in BC/DR.

in a DR situations there are some challenges:
re-iping Virtual Machines
storage guys not understanding or not present at the time of DR

SRM helps out in facing these challenges.

SRM simplify and authomatize DR workflow (no manual runbook), give centralized management console to do all the processes and let you test whenever you want without disrupting the production environment.

Replocation of storage is a MUST have to get SRM to work and a supported SRA (storage replication adapter, third party storage provided) must exists, to connect SRM to the storage.

Recovery plan:

SRM used for:
   Datacenter Migration
   Disaster avoidance: planned hw manteinance on Network in primary datacenter         or huricane arriving and you want to be a step ahead
   Moving application to QA nightly

application knowledge
   you need to know about your apps and to protect them
   difficult in sorting  out what to protect first (most important)
   what applications need to run to support the most important (AD, DBs, WEB servers, DNS, DHCP,...): what component should be online first in order for core application X to work?
   business impact assessment must be carried out

multitier applications are harder to protect so they need deep planning: commom dependencies is DNS/WEB/App/DB/AD

start protecting one application to understand all the undelying environmen: an application is the sum of his parts

BIA is critical for SRM project success.

Naminig conventions are VERY important in order to never get confused during a real disaster. Plan a vCenter Folder design that helps in finding protected and non protected machine at a glance.

Storage Organization
LUN is the granularity: everithing on a single LUN will be failed over together, so, one LUN equals to one Protection Group. Organize the storage so to know what VMs are on each LUN.

Storage Replication Adapter
    Alwayse read the realese notes and whitepaper from the storage vendors
    Mirrorview requires SnapView for testing and can only do one simultaneous running of recovery plan
    RecoveryPoint supports only 40 GBytes of changes during failover test
    some SRA needs Gatekeeper LUN's or or extra software
    support multiple running RP's?   
    Hitachi does automatic reversal after failover, without asking...   

   ESX 4.0U1 perform better than prior versions
   VMWare tools must be installed and uptodate on protected machine
   with NFS, fewer and bigger are better than more and smaller
   Recovery Plans need to be fine tuned
        Multiple simultaneous RP may help in improving RTO
DO COMPREHENSIVE TESTING! Test is done on an isolated VLAN.

some definition
RTO how quick can I be working again?
RPO how much data did I loose?