Using Impala Logging
The Impala logs record information about:
- Any errors Impala encountered. If Impala experienced a serious error during startup, you must diagnose and troubleshoot that problem before you can do anything further with Impala.
- How Impala is configured.
- Jobs Impala has completed.
Formerly, the logs contained the query profile for each query, showing low-level details of how the work is distributed among nodes and how intermediate and final results are transmitted across the network. To save space, those query profiles are now stored in zlib-compressed files in /var/log/impala/profiles. You can access them through the Impala web user interface. For example, at http://impalad-node-hostname:25000/queries, each query is followed by a Profile link leading to a page showing extensive analytical data for the query execution.
The auditing feature introduced in Cloudera Impala 1.1.1 produces a separate set of audit log files when enabled. See Auditing Impala Operations for details.
Cloudera recommends installing Impala through the Cloudera Manager administration interface. To assist with troubleshooting, Cloudera Manager collects front-end and back-end logs together into a single view, and let you do a search across log data for all the managed nodes rather than examining the logs on each node separately. If you installed Impala using Cloudera Manager, refer to the topics on Services Monitoring and Searching Logs in the Cloudera Manager Monitoring and Diagnostics Guide.
If you are using Impala in an environment not managed by Cloudera Manager, review Impala log files on each node:
- By default, the log files are under the directory /var/log/impala. To change log file locations, modify the defaults file described in Starting Impala.
- The significant files for the impalad process are impalad.INFO, impalad.WARNING, and impalad.ERROR. You might also see a file impalad.FATAL, although this is only present in rare conditions.
- The significant files for the statestored process are statestored.INFO, statestored.WARNING, and statestored.ERROR. You might also see a file statestored.FATAL, although this is only present in rare conditions.
- The significant files for the catalogd process are catalogd.INFO, catalogd.WARNING, and catalogd.ERROR. You might also see a file catalogd.FATAL, although this is only present in rare conditions.
- Examine the .INFO files to see configuration settings for the processes.
- Examine the .WARNING files to see all kinds of problem information, including such things as suboptimal settings and also serious runtime errors.
- Examine the .ERROR and/or .FATAL files to see only the most serious errors, if the processes crash, or queries fail to complete. These messages are also in the .WARNING file.
- A new set of log files is produced each time the associated daemon is restarted. These log files have long names including a timestamp. The .INFO, .WARNING, and .ERROR files are physically represented as symbolic links to the latest applicable log files.
- The init script for the impala-server service also produces a consolidated log file /var/logs/impalad/impala-server.log, with all the same information as the corresponding.INFO, .WARNING, and .ERROR files.
- The init script for the impala-state-store service also produces a consolidated log file /var/logs/impalad/impala-state-store.log, with all the same information as the corresponding.INFO, .WARNING, and .ERROR files.
Impala stores information using the glog_v logging system. You will see some messages referring to C++ file names. Logging is affected by:
- The GLOG_v environment variable specifies which types of messages are logged. See Setting Logging Levels for details.
- The -logbuflevel startup flag for the impalad daemon specifies how often the log information is written to disk. The default is 0, meaning that the log is immediately flushed to disk when Impala outputs an important messages such as a warning or an error, but less important messages such as informational ones are buffered in memory rather than being flushed to disk immediately.
- Cloudera Manager has an Impala configuration setting that sets the -logbuflevel startup option.
The main administrative tasks involved with Impala logs are:
Reviewing Impala Logs
By default, the Impala log is stored at /var/logs/impalad/. The most comprehensive log, showing informational, warning, and error messages, is in the file name impalad.INFO. View log file contents by using the web interface or by examining the contents of the log file. (When you examine the logs through the file system, you can troubleshoot problems by reading the impalad.WARNING and/or impalad.ERROR files, which contain the subsets of messages indicating potential problems.)
On a machine named impala.example.com with default settings, you could view the Impala logs on that machine by using a browser to access http://impala.example.com:25000/logs.
The web interface limits the amount of logging information displayed. To view every log entry, access the log files directly through the file system.
You can view the contents of the impalad.INFO log file in the file system. With the default configuration settings, the start of the log file appears as follows:
[user@example impalad]$ pwd /var/log/impalad [user@example impalad]$ more impalad.INFO Log file created at: 2013/01/07 08:42:12 Running on machine: impala.example.com Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg I0107 08:42:12.292155 14876 daemon.cc:34] impalad version 0.4 RELEASE (build 9d7fadca0461ab40b9e9df8cdb47107ec6b27cff) Built on Fri, 21 Dec 2012 12:55:19 PST I0107 08:42:12.292484 14876 daemon.cc:35] Using hostname: impala.example.com I0107 08:42:12.292706 14876 logging.cc:76] Flags (see also /varz are on debug webserver): --dump_ir=false --module_output= --be_port=22000 --classpath= --hostname=impala.example.com
Understanding Impala Log Contents
The logs store information about Impala startup options. This information appears once for each time Impala is started and may include:
- Machine name.
- Impala version number.
- Flags used to start Impala.
- CPU information.
- The number of available disks.
There is information about each job Impala has run. Because each Impala job creates an additional set of data about queries, the amount of job specific data may be very large. Logs may contained detailed information on jobs. These detailed log entries may include:
- The composition of the query.
- The degree of data locality.
- Statistics on data throughput and response times.
Setting Logging Levels
Impala uses the GLOG system, which supports three logging levels. You can adjust the logging levels using the Cloudera Manager Admin Console. You can adjust logging levels without going through the Cloudera Manager Admin Console by exporting variable settings. To change logging settings manually, use a command similar to the following on each node before starting impalad:
For more information on how to configure GLOG, including how to set variable logging levels for different system components, see How To Use Google Logging Library (glog).
Understanding What is Logged at Different Logging Levels
As logging levels increase, the categories of information logged are cumulative. For example, GLOG_v=2 records everything GLOG_v=1 records, as well as additional information.
Increasing logging levels imposes performance overhead and increases log size. Cloudera recommends using GLOG_v=1 for most cases: this level has minimal performance impact but still captures useful troubleshooting information.
Additional information logged at each level is as follows:
- GLOG_v=1 - The default level. Logs information about each connection and query that is initiated to an impalad instance, including runtime profiles.
- GLOG_v=2 - Everything from the previous level plus information for each RPC initiated. This level also records query execution progress information, including details on each file that is read.
- GLOG_v=3 - Everything from the previous level plus logging of every row that is read. This level is only applicable for the most serious troubleshooting and tuning scenarios, because it can produce exceptionally large and detailed log files, potentially leading to its own set of performance and capacity problems.