• Scala or Python or Java which is the best to learn Apache Spark?
  • Is Java mandatory to learn Spark?
  • I am good in Java, is it mandatory to learn Scala?
Most of the freshers who dig into Spark recently they will get this type of questions.
In this article i am answering those questions with good examples.
First of all any framework internally using a programming language. Let eg: Zend framework use PHP, Spring framework use Java, Panda framework use python. Similarly Hadoop framework use java, Spark framework use Scala programming language.
Both Hadoop use Hadoop api that support Java, Scala, Python and other languages. Spark copied many apis from Hadoop, pig, R, Panda include this Hadoop API. So Spar also support Java, Python, Scala additionally R and SQL also partially support. Means Scala is native language, additionally support Python and Java.
Please note JVM is king, many programming languages running on the top of JVM. Scala and Java most popular languages that run on top of JVM.  Scala most of the apis copied from Java. It means without code change you can import java apis directly. Means If you know Core java concepts (for loop, if else, switch, functions, objects) you can easily learn Scala. So core java recommended to learn Scala, but not mandatory.
jvm scala java
In another words Scala is nothing but java ++ means its advanced than java. Scala is bigdata friendly and concise the code. Another disadvantage is Java doesn’t support REPL. So Java is third preferred language for Spark. If you are implementing everything in Java, at that time java recommended language.
Python evergreen, you can use anywhere Python is Dynamically typed language. It means Dynamically change, it’s min problem in Dataset API. Dataset api is type safe, Statically type languages only suitable, so Python not suitable in DataSet API.
If you are implementing machine learning algorithms, like MLlib or Graphx or GraphFrames, or MlPipes Python language recommended.
Scala Very Hot trending programming language in BigData. Scala use JVM in run time, it’s 10 times faster than Python. Python also used to deliver enterprise-level applications, but Scala for highly optimized applications.
The power of Spark is DataSet API (introduced in spark 2.0), Scala support this api, but python doesn’t support. Most recent features like Dataset APIs available in Scala in compared to Python,
Both are functional and object oriented languages. Both support REPL.
If you are implementing data science technologies like MlLib, GraphX learn Python. If you are implementing data engineering technologies like SparkSQL or Spark Streaming, Scala preferred language.
Scala or Python or Java to learn Spark

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.