HP P6000 Continuous Access Implementation Guide (T3680-96431, August 2012)
rate based on the RPO (or over the RPO interval). Insufficient replication bandwidth impacts user
response time, RPO, or both.
Determining the critical sample period
Working with large measurement samples can be tedious and problematic. A two-stage approach
to data collection generally helps to reduce the effort. In the first stage, the historical write byte
rate trends are analyzed to determine peak periods that can occur during monthly or yearly business
cycles and daily usage cycles. Once a peak period is identified, a more granular measurement
(the second stage in the analysis) can be made to collect detailed one second measurements of
the I/O write profile. A 1- to 8-hour interval is ideal because the measurements can be easily
imported into a Microsoft Excel worksheet and charted for reduction and analysis.
If you have a good understanding of your organization's business cycles, the critical sample period
can be selected with very little additional data collection or reduction. If the write profile is unknown,
then the critical sample period can generally be identified from daily incremental backup volumes
or transaction rates from application logs. Setting up a long term collection for trending is generally
impractical as this could delay the sizing process by several weeks or more.
It is imperative that measurement data for all volumes sharing the intersite replication bandwidth
is collected over a common time frame so that the aggregate peak can be determined. This is
especially important when selecting the critical sample period.
Table 3 (page 19) shows recommended sample rate intervals for various RPOs. Remember that
the shorter the sample rate interval, the closer the solution will be to meeting your desired RPO.
Table 3 RPO sample rate intervals
Sample rate intervalDesired RPO
1 second0–60 minutes
30 seconds1–2 hours
1 minute2–3 hours
2 minutes3–4 hours
Up to 5 minutes> 4 hours
Sizing bandwidth for synchronous replication
Application data that is replicated synchronously is highly dependent on link latency because write
requests must be received at the recovery site before the application receives a completion. Write
response time in a replicated environment is greatly affected by propagation delays and queuing
effects. Latencies due to propagation delays can generally be measured and are typically fixed
for a given configuration. Latencies due to queuing effects at the link are more difficult to estimate.
Propagation delays due to distance can be estimated at 1 millisecond per 100 kilometers to account
for the round-trip exchange through dark fiber. Most applications easily accommodate an additional
1-millisecond of latency when DR sites are separated by a distance of 100 kilometers, a typical
metro-replication distance. At 500 to 1,000 kilometers, the 5- to 10-millisecond propagation latency
accounts for 25% to 50% of the 20-millisecond average latency budgeted to applications such as
email. This puts a practical cap for synchronous replication at about 100 kilometers. This is also
the distance that Fibre Channel data can be transmitted on a single-mode 9-mm fiber at 1 Gb/s
with long-distance SFPs.
Congestion delays on the interconnect is another source of replication latency. For example, a
4-KB write packet routed onto an IP link operating at 44 Mb/s (T3) incurs approximately 1
millisecond in latency as the fiber channel packets are serialized onto the slower link. A burst of
10 writes means the last write queued to the IP link experiences a 10-millisecond delay as it waits
for the previous 9 writes to be transmitted. This sample congestion delay also consumes 50% of a
Choosing the intersite link 19