2026 New CCA-500 Exam Dumps with PDF and VCE Free: https://www.2passeasy.com/dumps/CCA-500/

we provide Download Cloudera CCA-500 test questions which are the best for clearing CCA-500 test, and to get certified by Cloudera Cloudera Certified Administrator for Apache Hadoop (CCAH). The CCA-500 Questions & Answers covers all the knowledge points of the real CCA-500 exam. Crack your Cloudera CCA-500 Exam with latest dumps, guaranteed!

NEW QUESTION 1
You decide to create a cluster which runs HDFS in High Availability mode with automatic failover, using Quorum Storage. What is the purpose of ZooKeeper in such a configuration?

  • A. It only keeps track of which NameNode is Active at any given time
  • B. It monitors an NFS mount point and reports if the mount point disappears
  • C. It both keeps track of which NameNode is Active at any given time, and manages the Edits fil
  • D. Which is a log of changes to the HDFS filesystem
  • E. If only manages the Edits file, which is log of changes to the HDFS filesystem
  • F. Clients connect to ZooKeeper to determine which NameNode is Active

Answer: A

Explanation:
Reference: Reference:http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/PDF/CDH4-High-Availability-Guide.pdf(page 15)

NEW QUESTION 2
You are migrating a cluster from MApReduce version 1 (MRv1) to MapReduce version 2 (MRv2) on YARN. You want to maintain your MRv1 TaskTracker slot capacities when you migrate. What should you do/

  • A. Configure yarn.applicationmaster.resource.memory-mb and yarn.applicationmaster.resource.cpu-vcores so that ApplicationMaster container allocations match the capacity you require.
  • B. You don’t need to configure or balance these properties in YARN as YARN dynamically balances resource management capabilities on your cluster
  • C. Configure mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum ub yarn-site.xml to match your cluster’s capacity set by the yarn-scheduler.minimum-allocation
  • D. Configure yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores to match the capacity you require under YARN for each NodeManager

Answer: D

NEW QUESTION 3
Which command does Hadoop offer to discover missing or corrupt HDFS data?

  • A. Hdfs fs –du
  • B. Hdfs fsck
  • C. Dskchk
  • D. The map-only checksum
  • E. Hadoop does not provide any tools to discover missing or corrupt data; there is not need because three replicas are kept for each data block

Answer: B

Explanation:
Reference:https://twiki.grid.iu.edu/bin/view/Storage/HadoopRecovery

NEW QUESTION 4
You are configuring your cluster to run HDFS and MapReducer v2 (MRv2) on YARN. Which two daemons needs to be installed on your cluster’s master nodes?(Choose two)

  • A. HMaster
  • B. ResourceManager
  • C. TaskManager
  • D. JobTracker
  • E. NameNode
  • F. DataNode

Answer: BE

NEW QUESTION 5
You have a Hadoop cluster HDFS, and a gateway machine external to the cluster from which clients submit jobs. What do you need to do in order to run Impala on the cluster and submit jobs from the command line of the gateway machine?

  • A. Install the impalad daemon statestored daemon, and daemon on each machine in the cluster, and the impala shell on your gateway machine
  • B. Install the impalad daemon, the statestored daemon, the catalogd daemon, and the impala shell on your gateway machine
  • C. Install the impalad daemon and the impala shell on your gateway machine, and the statestored daemon and catalogd daemon on one of the nodes in the cluster
  • D. Install the impalad daemon on each machine in the cluster, the statestored daemon and catalogd daemon on one machine in the cluster, and the impala shell on your gateway machine
  • E. Install the impalad daemon, statestored daemon, and catalogd daemon on each machine in the cluster and on the gateway node

Answer: D

NEW QUESTION 6
You are planning a Hadoop cluster and considering implementing 10 Gigabit Ethernet as the network fabric. Which workloads benefit the most from faster network fabric?

  • A. When your workload generates a large amount of output data, significantly larger than the amount of intermediate data
  • B. When your workload consumes a large amount of input data, relative to the entire capacity if HDFS
  • C. When your workload consists of processor-intensive tasks
  • D. When your workload generates a large amount of intermediate data, on the order of the input data itself

Answer: A

NEW QUESTION 7
You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because you Hadoop cluster isn’t optimized for storing and processing many small files, you decide to do the following actions:
1. Group the individual images into a set of larger files
2. Use the set of larger files as input for a MapReduce job that processes them directly with python using Hadoop streaming.
Which data serialization system gives the flexibility to do this?

  • A. CSV
  • B. XML
  • C. HTML
  • D. Avro
  • E. SequenceFiles
  • F. JSON

Answer: E

Explanation:
Sequence files are block-compressed and provide direct serialization and deserialization of several arbitrary data types (not just text). Sequence files can be generated as the output of other MapReduce tasks and are an efficient intermediate representation for data that is passing from one MapReduce job to anther.

NEW QUESTION 8
Each node in your Hadoop cluster, running YARN, has 64GB memory and 24 cores. Your yarn.site.xml has the following configuration:
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>32768</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>12</value>
</property>
You want YARN to launch no more than 16 containers per node. What should you do?

  • A. Modify yarn-site.xml with the following property:<name>yarn.scheduler.minimum-allocation-mb</name><value>2048</value>
  • B. Modify yarn-sites.xml with the following property:<name>yarn.scheduler.minimum-allocation-mb</name><value>4096</value>
  • C. Modify yarn-site.xml with the following property:<name>yarn.nodemanager.resource.cpu-vccores</name>
  • D. No action is needed: YARN’s dynamic resource allocation automatically optimizes the node memory and cores

Answer: A

NEW QUESTION 9
You are working on a project where you need to chain together MapReduce, Pig jobs. You also need the ability to use forks, decision points, and path joins. Which ecosystem project should you use to perform these actions?

  • A. Oozie
  • B. ZooKeeper
  • C. HBase
  • D. Sqoop
  • E. HUE

Answer: A

NEW QUESTION 10
Identify two features/issues that YARN is designated to address:(Choose two)

  • A. Standardize on a single MapReduce API
  • B. Single point of failure in the NameNode
  • C. Reduce complexity of the MapReduce APIs
  • D. Resource pressure on the JobTracker
  • E. Ability to run framework other than MapReduce, such as MPI
  • F. HDFS latency

Answer: DE

Explanation:
Reference:http://www.revelytix.com/?q=content/hadoop-ecosystem(YARN, first para)

NEW QUESTION 11
Which YARN daemon or service monitors a Controller’s per-application resource using (e.g., memory CPU)?

  • A. ApplicationMaster
  • B. NodeManager
  • C. ApplicationManagerService
  • D. ResourceManager

Answer: A

NEW QUESTION 12
You observed that the number of spilled records from Map tasks far exceeds the number of map output records. Your child heap size is 1GB and your io.sort.mb value is set to 1000MB. How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?

  • A. For a 1GB child heap size an io.sort.mb of 128 MB will always maximize memory to disk I/O
  • B. Increase the io.sort.mb to 1GB
  • C. Decrease the io.sort.mb value to 0
  • D. Tune the io.sort.mb value until you observe that the number of spilled records equals (or is as close to equals) the number of map output records.

Answer: D

NEW QUESTION 13
Which YARN daemon or service negotiations map and reduce Containers from the Scheduler, tracking their status and monitoring progress?

  • A. NodeManager
  • B. ApplicationMaster
  • C. ApplicationManager
  • D. ResourceManager

Answer: B

Explanation:
Reference:http://www.devx.com/opensource/intro-to-apache-mapreduce-2-yarn.html(See resource manager)

NEW QUESTION 14
A user comes to you, complaining that when she attempts to submit a Hadoop job, it fails. There is a Directory in HDFS named /data/input. The Jar is named j.jar, and the driver class is named DriverClass.
She runs the command:
Hadoop jar j.jar DriverClass /data/input/data/output The error message returned includes the line:
PriviligedActionException as:training (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.lib.input.invalidInputException:
Input path does not exist: file:/data/input What is the cause of the error?

  • A. The user is not authorized to run the job on the cluster
  • B. The output directory already exists
  • C. The name of the driver has been spelled incorrectly on the command line
  • D. The directory name is misspelled in HDFS
  • E. The Hadoop configuration files on the client do not point to the cluster

Answer: A

NEW QUESTION 15
In CDH4 and later, which file contains a serialized form of all the directory and files inodes in the filesystem, giving the NameNode a persistent checkpoint of the filesystem metadata?

  • A. fstime
  • B. VERSION
  • C. Fsimage_N (where N reflects transactions up to transaction ID N)
  • D. Edits_N-M (where N-M transactions between transaction ID N and transaction ID N)

Answer: C

Explanation:
Reference:http://mikepluta.com/tag/namenode/

NEW QUESTION 16
Which scheduler would you deploy to ensure that your cluster allows short jobs to finish within a reasonable time without starting long-running jobs?

  • A. Complexity Fair Scheduler (CFS)
  • B. Capacity Scheduler
  • C. Fair Scheduler
  • D. FIFO Scheduler

Answer: C

Explanation:
Reference:http://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html

NEW QUESTION 17
Choose three reasons why should you run the HDFS balancer periodically?(Choose three)

  • A. To ensure that there is capacity in HDFS for additional data
  • B. To ensure that all blocks in the cluster are 128MB in size
  • C. To help HDFS deliver consistent performance under heavy loads
  • D. To ensure that there is consistent disk utilization across the DataNodes
  • E. To improve data locality MapReduce

Answer: CDE

Explanation:
http://www.quora.com/Apache-Hadoop/It-is-recommended-that-you-run-the-HDFS-balancer-periodically-Why-Choose-3

NEW QUESTION 18
You have a cluster running with a FIFO scheduler enabled. You submit a large job A to the cluster, which you expect to run for one hour. Then, you submit job B to the cluster, which you expect to run a couple of minutes only.
You submit both jobs with the same priority.
Which two best describes how FIFO Scheduler arbitrates the cluster resources for job and its tasks?(Choose two)

  • A. Because there is a more than a single job on the cluster, the FIFO Scheduler will enforce a limit on the percentage of resources allocated to a particular job at any given time
  • B. Tasks are scheduled on the order of their job submission
  • C. The order of execution of job may vary
  • D. Given job A and submitted in that order, all tasks from job A are guaranteed to finish before all tasks from job B
  • E. The FIFO Scheduler will give, on average, and equal share of the cluster resources over the job lifecycle
  • F. The FIFO Scheduler will pass an exception back to the client when Job B is submitted, since all slots on the cluster are use

Answer: AD

NEW QUESTION 19
Your Hadoop cluster contains nodes in three racks. You have not configured the dfs.hosts property in the NameNode’s configuration file. What results?

  • A. The NameNode will update the dfs.hosts property to include machines running the DataNode daemon on the next NameNode reboot or with the command dfsadmin–refreshNodes
  • B. No new nodes can be added to the cluster until you specify them in the dfs.hosts file
  • C. Any machine running the DataNode daemon can immediately join the cluster
  • D. Presented with a blank dfs.hosts property, the NameNode will permit DataNodes specified in mapred.hosts to join the cluster

Answer: C

NEW QUESTION 20
During the execution of a MapReduce v2 (MRv2) job on YARN, where does the Mapper place the intermediate data of each Map Task?

  • A. The Mapper stores the intermediate data on the node running the Job’s ApplicationMaster so that it is available to YARN ShuffleService before the data is presented to the Reducer
  • B. The Mapper stores the intermediate data in HDFS on the node where the Map tasks ran in the HDFS /usercache/&(user)/apache/application_&(appid) directory for the user who ran the job
  • C. The Mapper transfers the intermediate data immediately to the reducers as it is generated by the Map Task
  • D. YARN holds the intermediate data in the NodeManager’s memory (a container) until it is transferred to the Reducer
  • E. The Mapper stores the intermediate data on the underlying filesystem of the local disk in the directories yarn.nodemanager.locak-DIFS

Answer: E

NEW QUESTION 21
......

Thanks for reading the newest CCA-500 exam dumps! We recommend you to try the PREMIUM Certstest CCA-500 dumps in VCE and PDF here: https://www.certstest.com/dumps/CCA-500/ (60 Q&As Dumps)