Big data tool are the application which provides distributed processing of huge data efficiently.

_Redshift Spark Takes care of: distribution, load processing, ensure competition of task if node failes by assigning same to other node

Ex: want to combine data from different database and different tables, spark has connectors to connect to different types of database(sql/nosql) and sync data.

Ex: Apache open source big data tool

  1. Plathora
  2. Pino
  3. NyPhy
  4. MapReduce