Requirements and Restrictions for Data Migration between CDH 4 and CDH 5

  1. The CDH 5 cluster must have a MapReduce service running on it (MRv1 or YARN (MRv2)).
  2. All the MapReduce nodes in the CDH 5 cluster should have full network access to all the nodes of the source cluster. This allows you to perform the copy in a distributed manner.
  3. To copy data between a secure and an insecure cluster, you must run the distcp command on the secure cluster.
  4. To copy data from a CDH 4 to a CDH 5 cluster, you can do one of the following:
The following restrictions currently apply (see Apache Hadoop Known Issues):
  • DistCp does not work between a secure cluster and an insecure cluster in some cases.

    As of CDH 5.1.3, DistCp does work between a secure and an insecure cluster if you use the webHDFS protocol and run the command from the secure cluster side after setting ipc.client.fallback-to-simple-auth-allowed to true, as described under Copying Data between a Secure and an Insecure Cluster using DistCp and WebHDFS.

  • To use DistCp using Hftp from a secure cluster using SPNEGO, you must configure the dfs.https.port property on the client to use the HTTP port (50070 by default).