User`s guide

Transfer Data to or from a Cloud Cluster

1-21

Transfer Data to or from a Cloud Cluster

In this section...

“Transfer Data from Amazon S3 Account” on page 1-21

“Transfer Data with Job Methods and Properties” on page 1-21

“Download SSH Key Identity File” on page 1-22

“Transfer Data with Standard Utilities” on page 1-22

“Transfer Data with the remotecopy Utility” on page 1-24

“Retrieve Data from Persisted Storage Without Starting a Cluster” on page 1-25

Transfer Data from Amazon S3 Account

When creating your cluster, the advanced options provide access to your Amazon S3

account files. Click Add Files to specify which files you want to make available to your

cluster nodes. (This option is not available after you have created a cluster.) When the

cluster starts up, before the mdce process starts, the specified S3 files are copied into the

folder /shared/imported on the cluster’s shared file system. If any of the files have the

extension .gz, .gzip, .tar, or .zip, they are automatically expanded.

Note Transfering a large amount of data from your Amazon S3 account can cause the

cluster to time out during its startup. If your data size exceeds approximately 5 GB, start

your cluster without the S3 data transfer, then upload the necessary data to the cluster

/shared/persisted folder from a local drive as described in either “Transfer Data with

Standard Utilities” on page 1-22 or “Transfer Data with the remotecopy Utility” on

page 1-24.

Transfer Data with Job Methods and Properties

To transfer data to the cloud cluster, you can use the AttachedFiles or JobData

property, in the same way you use these for other clusters. For example:

Place all required executable and data files in the same folder.

Specify that folder in the AttachedFiles property of the job.

When you submit your job, the files are transferred to the cloud and made available

to the workers running on the cloud cluster.