Most of the students looking for bigdata training. Apache spark is number one framework in bigdata. So most of the knowledge seekers looking for spark training, few self learners also planning to learn spark. In this post i am explaining how to learn spark, what are the prerequisites to learn apache spark?
Every framework internally using a programming language. To implement any framework, must have any programming language experience. Let eg:
To implement Zend framework must have php experience.
To implement Panda framework must have python experience.
To implement spring framework must have java experience.
To implement Hadoop framework, must have java experience.
To implement Spark framework, must have Scala experience.
To implement Flink framework, must have Scala experience.
Means to learn Spark framework, you must have minimum knowledge in Scala. It’s a new programming language, but it’s very powerful. If you know any programming language like C, C++, core java, php, python, or any other language , you can easily learn Scala language. Borh JAVA and Scala run on the top of JVM, so if you know Java, you can easily learn Scala. In another words java ++ called Scala.
For the last 10 years Java and Oracle are managing entire software industry. So most of the students have minimum knowledge in core java and sql in college level. If you know for loop, functions, interface, objects, switch, while, and classes enough to learn Scala. Almost all these topics available in any programming languages. Please note knowledge enough, no need experience to learn Scala. You can implement spark applications using scala, java or python, but scala recommended.
Now Bigdata in bigdata, most popular old framework is Hadoop. Hadoop knowledge also highly recommended to learn Spark, but no need to learn mapreduce. Means in either hadoop or spark backend using HDFS & YARN. Means if you want to learn Spark, You must have knowledge on HDFS & YARN. These two topics available in Hadoop. So if you have knowledge on HDFS & YARN and Hive it’s huge plus to learn Spark, but it’s not mandatory.
Similarly in Spark, most of the projects using Spark SQL. So SQL knowledge highly recommended to implement spark realtime projects. In SQL mainly select * from, joins and group by these three commands highly recommended.
Optionally Any cloud technology like AWS, highly recommended to implement production environment projects. The main reason most of the bigdata projects associated with a cloud. So AWS knowledge optionally recommended in Spark.
To learn spark, or Hadoop, minimum 4gb ram system with minimum 30gb disk recommended
Before dig into Bigdata you must have minimum knowledge on core java and sql. Hadoop (HDFS & YARN) knowledge highly recommended. Optionally AWS, prerequisites to learn spark.