Self-Learn Yourself Apache Spark in 21 Blogs – #5
In Blog 5, we will see Apache Spark Languages with basic Hands-on. Click to have quick read on the other blogs of Apache Spark in this learning series.
With our cloud setup of our Apache Spark now we are ready to develop big data Spark applications. And before getting started with building Spark applications let’s review the languages which can be used to develop Apache Spark applications. It has many APIs like Scala, Hive, R, Python, Java, and Pig.
Scala – It’s the language where used to develop Apache Spark itself. Scala is meant for Scalable language.
Java – It’s most oblivious one which used to develop many big data Spark Applications. Spark even supports Java 8.
Python – Spark also supports Python APIs, where many of MLlib applications are developed.
R – And as of Spark 1.4 release, Apache Spark supports R API too, which is statistical language mostly used by many data scientist.
So, Apache Spark uses many languages in the big data spectrum.
Hello World, am fan of Apache Spark! is going to be first hands on.
Spark comes with interactive shell called REPL – Read, Evaluate, Print, and Loop. With help of REPL Spark can bring interactive querying in big data. It helps to build code quickly & interactively.
Now let’s give the below command,
First thing to notice that the Spark shell creates two values for you one sc and other one is sqlcontext. Sqlcontext is used to execute the programs in the Spark SQL library. And sc is Spark Context which is the core engine for Spark applications. All Spark jobs begin by creating sc, which delicate and control of the distributed applications.
The above command is used to create RDD for the README.md file. Once we fire the above command immediately we will be having our RDD created for the file. RDD is the basic abstraction of Spark. RDD means Resilient Distributed Dataset.
Spark core operations are split into two one is transformations and actions. Transformations are lazily evaluated; whenever the actions are executed actually the results are computed.
We have many benefits of Apache Spark, if it’s not lazy then we would be loaded with entire file which is not necessarily so lazy evaluation brings more performance to Apache Spark.
The above commands will helps us to have the Apache Spark word count program.
In Blog 6 – Let’s have The RDD, RDDs Input, Hands-on.
If you see something here that interests you, we’d love to have you involved.
Please subscribe at www.dataottam.com to keep you trendy and for future reads on Big Data, Analytics, and IoT.
And as, always please feel free to comment via firstname.lastname@example.org to make it the best.