White Papers
27 PS Series Asynchronous Replication Best Practices and Sizing Guide | BP1012
7 Best practices for planning and design
7.1 Recovery time objective (RTO) and recovery point objective (RPO)
In simple terms, as it applies to replication and disaster recovery, RTO is how long a business can get by
without a particular system or application in the event of a disaster. An RTO of 24 hours implies that after a
disaster, the system or data needs to be online and available again within 24 hours.
The term RPO is generally used to describe the acceptable loss or the time gap of data to lose in the event of
a disaster. A business must decide if it is acceptable to lose any data in the event of a disaster, and if so, how
much can be lost without a significant impact to the business. For some applications or systems, this may be
as long as 24 hours, while for others this may be as little as 15 minutes, or even zero.
The RPO and RTO requirements of a business must be met when implementing a disaster recovery plan.
The asynchronous replication feature included with PS Series arrays allow data to be replicated across a SAN
or WAN link to another PS Series array. How often a replica is synchronized must align with the RPO. In other
words, if a volume is replicated and the changes are synchronized every four hours, then the most amount of
data that would be lost if a disaster occurred at the primary site is less than four hours of data, because that is
the interval between replicas. The time it would take to recover the data and make it available again should
align with the RTO.
It is not uncommon to have different RPOs for the same data or system. One RPO requirement may apply to
disaster recovery in which an entire site is offline, while there may be a different RPO requirement for local
recovery. In such cases, it is common to use a combination of replication and array-based volume snapshots
or volume clones to meet those requirements. In other cases, a combination of asynchronous replication and
a host-based snapshot or replication management software product may be used. In either case,
asynchronous replication serves as part of the total solution design required to meet the recovery objectives
of the business.
7.2 The network
Replication uses ICMP and TCP port 3260 (standard iSCSI). These ports must remain open across the WAN
link for replication to perform properly. Any switches, routers, and firewalls between the two sites must be
configured as such to allow the arrays to communicate. The network should also be secured by using
firewalls, VPN, encryption, or other means. However, firewalls must not use NAT or PAT between the sites for
replication traffic.
A slow, underperforming network can greatly affect the speed of replication. If multiple replicas are scheduled
simultaneously, then the arrays will attempt multiple streams through that slow link, which could increase
congestion and cause abnormally slow replication performance. To prevent overloading the WAN link, the
storage administrator must understand how much data will need to be transmitted through the WAN link as
well as what the conditions of the link look like (such as latency or packet loss).