Parcels

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

A parcel is a binary distribution format containing the program files, along with additional metadata used by Cloudera Manager. There are a few notable differences between parcels and packages:
  • Parcels are self-contained and installed in a versioned directory, which means that multiple versions of a given parcel can be installed side-by-side. You can then designate one of these installed versions as the active one. With packages, only one package can be installed at a time so there's no distinction between what's installed and what's active.
  • Parcels can be installed at any location in the filesystem and by default are installed in /opt/cloudera/parcels. In contrast, packages are installed in /usr/lib.

Parcels are available for CDH 4.1.3 or later, and for Impala, Search, Spark, Accumulo, Kafka, Key Trustee KMS, and Sqoop Connectors.

Advantages of Parcels

As a consequence of their unique properties, parcels offer a number of advantages over packages:
  • CDH is distributed as a single object - In contrast to having a separate package for each part of CDH, when using parcels there is just a single object to install. This is especially useful when managing a cluster that isn't connected to the Internet.
  • Internal consistency - All CDH components are matched so there isn't a danger of different parts coming from different versions of CDH.
  • Installation outside of /usr - In some environments, Hadoop administrators do not have privileges to install system packages. In the past, these administrators had to fall back to CDH tarballs, which deprived them of a lot of infrastructure that packages provide. With parcels, administrators can install to /opt or anywhere else without having to step through all the additional manual steps of regular tarballs.
  • Installation of CDH without sudo - Parcel installation is handled by the Cloudera Manager Agent running as root so it's possible to install CDH without needing sudo.
  • Decouples distribution from activation - Due to side-by-side install capabilities, it is possible to stage a new version of CDH across the cluster in advance of switching over to it. This allows the longest running part of an upgrade to be done ahead of time without affecting cluster operations, consequently reducing the downtime associated with upgrade.
  • Rolling upgrades - These are only possible with parcels, due to their side-by-side nature. Packages require shutting down the old process, upgrading the package, and then starting the new process. This can be hard to recover from in the event of errors and requires extensive integration with the package management system to function seamlessly. When a new version is staged side-by-side, switching to a new minor version is simply a matter of changing which version of CDH is used when restarting each process. It then becomes practical to do upgrades with rolling restarts, where service roles are restarted in the right order to switch over to the new version with minimal service interruption. Your cluster can continue to run on the existing installed components while you stage a new version across your cluster, without impacting your current operations. Note that major version upgrades (for example, CDH 4 to CDH 5) require full service restarts due to the substantial changes between the versions. Finally, you can upgrade individual parcels, or multiple parcels at the same time.
  • Upgrade management - Cloudera Manager can fully manage all the steps involved in a CDH version upgrade. In contrast, with packages, Cloudera Manager can only help with initial installation.
  • Distributing additional components - Parcels are not limited to CDH. Cloudera Impala, Cloudera Search, LZO, and add-on service parcels are also available.
  • Compatibility with other distribution tools - If there are specific reasons to use other tools for download and/or distribution, you can do so, and Cloudera Manager will work alongside your other tools. For example, you can handle distribution with Puppet. Or, you can download the parcel to Cloudera Manager Server manually (perhaps because your cluster has no Internet connectivity) and then have Cloudera Manager distribute the parcel to the cluster.

Parcel Life Cycle

To enable upgrades and additions with minimal disruption, parcels participate in six phases: download, distribute, activate: deactivate, remove, and delete.

  • Downloading a parcel copies the appropriate software to a local parcel repository on the Cloudera Manager Server, where it is available for distribution to the other hosts in any of your clusters managed by this Cloudera Manager Server. You can have multiple parcels for a given product downloaded to your Cloudera Manager Server. Once a parcel has been downloaded to the Server, it will be available for distribution on all clusters managed by the Server. A downloaded parcel will appear in the cluster-specific section for every cluster managed by this Cloudera Manager Server.
  • Distributing a parcel copies the parcel to the member hosts of a cluster and unpacks it. Distributing a parcel does not actually upgrade the components running on your cluster; the current services continue to run unchanged. You can have multiple parcels distributed on your cluster.
  • Activating a parcel causes the Cloudera Manager to link to the new components, ready to run the new version upon the next restart. Activation does not automatically stop the current services or perform a restart — you have the option to restart the service(s) after activation, or you can allow the system administrator to determine the appropriate time to perform those operations.
  • Deactivating a parcel causes Cloudera Manager to unlink from the parcel components. A parcel cannot be deactivated while it is still in use on one or more hosts.
  • Removing a parcel causes Cloudera Manager to remove the parcel components from the hosts.
  • Deleting a parcel causes Cloudera Manager to remove the parcel components from the local parcel repository.

For example, the following screenshot:


shows:
  • One activated CDH parcel
  • One SOLR parcel distributed and ready to activate
  • One Impala parcel being downloaded
  • One CDH parcel being distributed

Cloudera Manager detects when new parcels are available. The parcel indicator in the Admin Console navigation bar ()indicates how many parcels are eligible for downloading or distribution. For example, CDH parcels older than the active one do not contribute to the count if you are already using the latest version. If no parcels are eligible, or if all parcels have been activated, then the indicator will not have a number badge. You can configure Cloudera Manager to download and distribute parcels automatically, if desired.

Parcel Locations

The default location for the local parcel directory on the Cloudera Manager Server host is /opt/cloudera/parcel-repo. To change this location, follow the instructions in Configuring Cloudera Manager Server Parcel Settings.

The default location for the distributed parcels on the managed hosts is /opt/cloudera/parcels. To change this location, set the parcel_dir property in /etc/cloudera-scm-agent/config.ini file of the Cloudera Manager Agent and restart the Cloudera Manager Agent or by following the instructions in Configuring the Host Parcel Directory.

Managing Parcels

Through the Parcels interface in Cloudera Manager, you can determine what software versions are running across your clusters. You access the Parcels page by doing one of the following:
  • Clicking the parcel indicator in the Admin Console navigation bar ()
  • Clicking the Hosts in the top navigation bar, then the Parcels tab.

The Parcels page is divided into several sections. The top section, labeled Downloadable, shows you all the parcels that are available for download from the configured parcel repositories.

Below the Downloadable section, each cluster managed by this Cloudera Manager Server has a section that shows the parcels that have been downloaded, distributed, or activated on that cluster.

When you download a parcel, it appears under every cluster, if you are managing more than one. However, this just indicates that the parcel is available for distribution on those clusters — in fact there is only one copy of the downloaded parcel, residing on the Cloudera Manager Server. Only after you distribute the parcel to a cluster will copies of it be placed on the hosts in that cluster.

Downloading a Parcel

  1. Click the parcel indicator in the top navigation bar. This takes you to the Hosts page, Parcels tab. By default, any parcels available for download are shown in the Downloadable section of the Parcels page. Parcels available for download will display a Download button.

    If the parcel you want is not shown here — for example, you want to upgrade to version of CDH that is not the most current version — you can make additional remote parcel repositories available through the Administration Settings page. You can also configure the location of the local parcel repository and other settings. See Parcel Configuration Settings.

  2. Click Download to initiate the download of the parcel from the remote parcel repository to your local repository.

When the parcel has been downloaded, the button label changes to Distribute.

Distributing a Parcel

Parcels that have been downloaded can be distributed to the hosts in your cluster, available for activation.

From the Parcels tab, click the Distribute button for the parcel you want to distribute. This starts the distribution process to the hosts in the cluster.

Distribution does not require Internet access; rather the Cloudera Manager Agent on each cluster member downloads the parcel from the local parcel repository hosted on the Cloudera Manager Server.

If you have a large number of hosts to which the parcels should be distributed, you can control how many concurrent uploads Cloudera Manager will perform. You can configure this setting on the Administration page, Properties tab under the Parcels section.

You can delete a parcel that is ready to be distributed; click the triangle at the right end of the Distribute button to access the Delete command. This will delete the downloaded parcel from the local parcel repository.

Distributing parcels to the hosts in the cluster does not affect the current running services.

Activating a Parcel

Parcels that have been distributed to the hosts in a cluster are ready to be activated.

  1. From the Parcels tab, click the Activate button for the parcel you want to activate. This will update Cloudera Manager to point to the new software, ready to be run the next time a service is restarted.
  2. A pop-up warns you that your currently running process will not be affected until you restart, and gives you the option to perform a restart. If you do not want to restart at this time, click Close.

If you elect not to restart services as part of the Activation process, you can instead go to the Clusters tab and restart your services at a later time. Until you restart services, the current software will continue to run. This allows you to restart your services at a time that is convenient based on your maintenance schedules or other considerations.

Activating a new parcel also deactivates the previously active parcel (if any) for the product you've just upgraded. However, until you restart the services, the previously active parcel will have the link Still in use and you will not be able to remove the parcel until it is no longer being used.

Deactivating a Parcel

You can deactivate an active parcel; this will update Cloudera Manager to point to the previous software version, ready to be run the next time a service is restarted. To deactivate a parcel, click Actions on an activated parcel and select Deactivate.

To use the previous version of the software, go to the Clusters tab and restart your services.

Removing a Parcel

To remove a parcel, click the down arrow to the right of an Activate button and select Remove from Hosts.

Deleting a Parcel

To delete a parcel, click the down arrow to the right of a Distribute button and select Delete.

Troubleshooting

If you experience an error while performing parcel operations, click on the red 'X' icons on the parcel page to display a message that will identify the source of the error.

If you have a parcel distributing but never completing, make sure you have enough free space in the parcel download directories, as Cloudera Manager will retry to downloading and unpacking parcels even if there is insufficient space.

Viewing Parcel Usage

The Parcel Usage page shows you which parcels are in current use in your clusters. This is particularly useful in a large deployment where it may be difficult to keep track of what versions are installed across the cluster, especially if some hosts were not available when you performed an installation or upgrade, or were added later. To display the Parcel Usage page:
  1. Do one of the following:
    • Click in the top navigation bar
    • Click Hosts in the top navigation bar and click the Parcels tab.
  2. Click the Parcel Usage button.

This page only shows the usage of parcels, not components that were installed as packages. If you select a cluster running packages (for example, a CDH 4 cluster) the cluster is not displayed, and instead you will see a message indicating the cluster is not running parcels. If you have individual hosts running components installed as packages, they will appear as "empty."

You can view parcel usage by cluster, or by product.

You can also view just the hosts running only the active parcels, or just hosts running older parcels (not the currently active parcels) or both.

The "host map" at the right shows each host in the cluster with the status of the parcels on that host. If the host is actually running the processes from the currently activated parcels, the host is indicated in blue. A black square indicates that a parcel has been activated, but that all the running processes are from an earlier version of the software. This can happen, for example, if you have not restarted a service or role after activating a new parcel.

Move the cursor over the icon to see the rack to which the hosts are assigned. Hosts on different racks are displayed in separate rows.

To view the exact versions of the software running on a given host, you can click on the square representing the host. This pops up a display showing the parcel versions installed on that host.

For CDH 4.4, Impala 1.1.1, and Solr 0.9.3 or higher, the pop-up lists the roles running on the selected host that are part of the listed parcel. Clicking a role opens the Cloudera Manager page for that role. It also shows whether the parcel is active or not.

If a host is running various software versions, the square representing the host is a four-square icon . When you move the cursor over that host, both the active and inactive components are shown. For example, in the image below, the older CDH parcel has been deactivated, but only the HDFS service has been restarted.

Parcel Configuration Settings

You can configure where parcels are stored on the Cloudera Manager Server host, the URLs of parcel repositories, the properties of a proxy server through which parcels are downloaded, and where parcels distributed to cluster hosts are stored.

Configuring Cloudera Manager Server Parcel Settings

  1. Use one of the following methods to open the parcel settings page:
    • Navigation bar
      1. Click in the top navigation bar
      2. Click the Edit Settings button.
    • Menu
      1. Select Administration > Settings.
      2. Click the Parcels category.
    • Tab
      1. Click the Hosts tab.
      2. Click the Configuration tab.
      3. Click the Parcels category.
      4. Click the Edit Settings button.
  2. Specify a property:
    • Local Parcel Repository Path defines the path on the Cloudera Manager Server host where downloaded parcels are stored.
    • Remote Parcel Repository URLs is a list of repositories that Cloudera Manager should check for parcels. Initially this points to the latest released CDH 4, CDH 5, Impala, and Solr repositories but you can add your own repository locations to the list. You can use this mechanism to add Cloudera repositories that are not listed by default, such as older versions of CDH, or the Sentry parcel for CDH 4.3. You can also use this to add your own custom repositories. The locations of the Cloudera parcel repositories are https://archive.cloudera.com/product/parcels/version, where product is cdh4, cdh5, gplextras5, impala, search, and sentry, and version is a specific product version or latest.
      To add a parcel repository:
      1. In the Remote Parcel Repository URLs list, click to open an additional row.
      2. Enter the path to the repository.
  3. Click Save Changes.
You can also:
  • Set the frequency with which Cloudera Manager will check for new parcels.
  • Configure a proxy to access to the remote repositories.
  • Configure whether downloads and distribution of parcels should occur automatically whenever new ones are detected. If automatic downloading/distribution are not enabled (the default), you must go to the Parcels page to initiate these actions.
  • Control which products can be downloaded if automatic downloading is enabled.
  • Control whether to retain downloaded parcels.
  • Control whether to retain old parcel version and how many parcel versions to retain
You can configure the bandwidth limits and the number of concurrent uploads, to tune the load that parcel distribution puts on your network. The defaults are up to 50 concurrent parcel uploads and 50 MiB/s aggregate bandwidth.
  • The concurrent upload count (Maximum Parcel Uploads) doesn't matter, theoretically, if all hosts have the same speed Ethernet. In general, 50 concurrent uploads is an acceptable setting in most cases. However, in a scenario where the server has more bandwidth (say 10Gbe while the normal hosts are using 1Gbe), then the count is important to maximize bandwidth, and would need to be at least the difference in speeds (10x in this case).
  • The bandwidth limit (Parcel Distribution Rate Limit) should be your Ethernet speed (in MiB/seconds) divided by approximately 16. You can use a higher limit if you have QoS set up to prevent starving other services, or if you are willing accept a higher risk of higher bandwidth load.

Configuring a Proxy Server

To configure a proxy server through which parcels are downloaded, follow the instructions in Configuring Network Settings.

Configuring the Host Parcel Directory

To configure the location of distributed parcels:
  1. Click Hosts in the top navigation bar.
  2. Click the Configuration tab.
  3. Configure the value of the Parcel Directory property. The setting of the parcel_dir property in the Cloudera Manager Agent configuration file overrides this setting.
  4. Click Save Changes to commit the changes.
  5. On each host, restart the Cloudera Manager Agent:
    $ sudo service cloudera-scm-agent restart