HP P6000 Continuous Access Implementation Guide (T3680-96431, August 2012)

rate based on the RPO (or over the RPO interval). Insufficient replication bandwidth impacts user

response time, RPO, or both.

Determining the critical sample period

Working with large measurement samples can be tedious and problematic. A two-stage approach

to data collection generally helps to reduce the effort. In the first stage, the historical write byte

rate trends are analyzed to determine peak periods that can occur during monthly or yearly business

cycles and daily usage cycles. Once a peak period is identified, a more granular measurement

(the second stage in the analysis) can be made to collect detailed one second measurements of

the I/O write profile. A 1- to 8-hour interval is ideal because the measurements can be easily

imported into a Microsoft Excel worksheet and charted for reduction and analysis.

If you have a good understanding of your organization's business cycles, the critical sample period

can be selected with very little additional data collection or reduction. If the write profile is unknown,

then the critical sample period can generally be identified from daily incremental backup volumes

or transaction rates from application logs. Setting up a long term collection for trending is generally

impractical as this could delay the sizing process by several weeks or more.

It is imperative that measurement data for all volumes sharing the intersite replication bandwidth

is collected over a common time frame so that the aggregate peak can be determined. This is

especially important when selecting the critical sample period.

Table 3 (page 19) shows recommended sample rate intervals for various RPOs. Remember that

the shorter the sample rate interval, the closer the solution will be to meeting your desired RPO.

Table 3 RPO sample rate intervals

Sample rate intervalDesired RPO

1 second0–60 minutes

30 seconds1–2 hours

1 minute2–3 hours

2 minutes3–4 hours

Up to 5 minutes> 4 hours

Sizing bandwidth for synchronous replication

Application data that is replicated synchronously is highly dependent on link latency because write

requests must be received at the recovery site before the application receives a completion. Write

response time in a replicated environment is greatly affected by propagation delays and queuing

effects. Latencies due to propagation delays can generally be measured and are typically fixed

for a given configuration. Latencies due to queuing effects at the link are more difficult to estimate.

Propagation delays due to distance can be estimated at 1 millisecond per 100 kilometers to account

for the round-trip exchange through dark fiber. Most applications easily accommodate an additional

1-millisecond of latency when DR sites are separated by a distance of 100 kilometers, a typical

metro-replication distance. At 500 to 1,000 kilometers, the 5- to 10-millisecond propagation latency

accounts for 25% to 50% of the 20-millisecond average latency budgeted to applications such as

email. This puts a practical cap for synchronous replication at about 100 kilometers. This is also

the distance that Fibre Channel data can be transmitted on a single-mode 9-mm fiber at 1 Gb/s

with long-distance SFPs.

Congestion delays on the interconnect is another source of replication latency. For example, a

4-KB write packet routed onto an IP link operating at 44 Mb/s (T3) incurs approximately 1

millisecond in latency as the fiber channel packets are serialized onto the slower link. A burst of

10 writes means the last write queued to the IP link experiences a 10-millisecond delay as it waits

for the previous 9 writes to be transmitted. This sample congestion delay also consumes 50% of a

Choosing the intersite link 19