Accessing External Storage from Spark
Spark can access all storage sources supported by Hadoop, including a local file system, HDFS, HBase, and Amazon S3.
For developer information about working with external storage, see External Storage in the Spark Programming Guide.
Accessing Compressed Files
- saveAsTextFile(path, compressionCodecClass="codec_class")
- saveAsHadoopFile(path,outputFormatClass, compressionCodecClass="codec_class")
For examples of accessing Avro and Parquet files, see Spark with Avro and Parquet.
For details on how to access specific types of external storage and files, see:
Using Spark with Azure Data Lake Storage (ADLS)
Microsoft Azure Data Lake Store (ADLS) is a cloud-based filesystem that you can access through Spark applications. Data files are accessed using a adl:// prefix instead of hdfs://. See Configuring Azure Data Lake Store to Use with CDH for instructions to set up ADLS as a storage layer for a CDH cluster.