Big Data Training in Bangalore

Hadoop (Big Data) is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative.

Hadoop was inspired by Google's MapReduce, a software framework in which an application is broken down into numerous small parts. Any of these parts (also called fragments or blocks) can be run on any node in the cluster. Doug Cutting, Hadoop's creator, named the framework after his child's stuffed toy elephant. The current Apache Hadoop ecosystem consists of the Hadoop kernel, MapReduce, the Hadoop distributed file system (HDFS) and a number of related projects such as Apache Hive, HBase and Zookeeper.

The Hadoop framework is used by major players including Google, Yahoo and IBM, largely for applications involving search engines and advertising. The preferred operating systems are Windows and Linux but Hadoop can also work with BSD and OS X.

Training Objectives :-

  • To learn Hadoop Architecture
  • Learn Map reduce Architecture
  • Hadoop developer tasks
  • Hadoop Administrative tasks
  • Hbase Architecture
  • Hive Architecture

  • Hadoop Architecture :

    * Introduction to Hadoop
    * Parallel Computer vs. Distributed Computing
    * How to install Hadoop on your system
    * How to install Hadoop cluster on multiple machines
    * Hadoop Daemons introduction: NameNode, DataNode, JobTracker, TaskTracker
    * Exploring HDFS (Hadoop Distributed File System)
    * Exploring the HDFS Apache Web UI
    * Name Node architecture (Edit Log, FsImage, location of replicas)
    * Secondary Name Node architecture
    * Data Node architecture

    Map Reduce Architecture:

    * Exploring Job Tracker/Task Tracker
    * How a client submits a Map-Reduce job
    * Exploring Mapper/Reducer/Combiner
    * Shuffle: Sort & Partition
    * Input/output formats
    * Job Scheduling (FIFO, Fair Scheduler, Capacity Scheduler)
    * Exploring the Apache Map Reduce Web UI

    Hadoop Developer Tasks

    * Writing a map-reduce programme
    * Reading and writing data using Java
    * Hadoop Eclipse integration
    * Mapper in details
    * Reducer in details
    * Using Combiners
    * Reducing Intermediate Data with Combiners
    * Writing Partitioners for Better Load Balancing
    * Sorting in HDFS
    * Searching in HDFS
    * Indexing in HDFS
    * Hands-On Exercise

    Hadoop Administrative Tasks

    * Routine Administrative Procedures
    * Understanding dfs admin and mr admin
    * Block Scanner, Balancer
    * Health Check & Safe mode
    * Data Node commissioning/decommissioning
    * Monitoring and Debugging on a production cluster
    * Name Node Backup and Recovery
    * ACL (Access control list)
    * Upgrading Hadoop

    HBase Architecture

    * Introduction to Hbase
    * HBase vs. RDBMS
    * Exploring HBase Master & region server
    * Column Families and Regions
    * Basic Hbase shell commands.

    Hive Architecture

    * Introduction to Hive
    * HBase vs Hive
    * Installation of Hive
    * HQL (Hive query language)
    * Basic Hive commands
  • Candidate have knowledge in Core Java would be an advantage to learn Hadoop.
  • Weekdays 8.00 AM to 9.30 AM
    Weekend 11.00 AM to 2.00 PM