Top 10 Requirements for all the way to YARN!
Best wishes for the New Year from whole team of dataottam.
In this blog let’s discuss on why and how Apache Hadoop YARN came about. YARN’s requirements emerged and evolved from the practical needs of long-existing cluster deployments of Hadoop, both small and large, and we discuss how each of these requirements ultimately shaped YARN. YARN’s architecture addresses many of these long-standing requirements, based on experience evolving the MapReduce platform.
To really comprehend the requirement or motivation of YARN, we have to start by taking a closer look at each requirement itself.
Scalability – The next-generation compute platform should scale horizontally to tens of thousands of nodes and concurrent applications.
Serviceability – The next-generation compute platform should enable evolution of cluster software to be completely decoupled from users’ applications.
Multitenancy – The next-generation compute platform should support multiple tenants to co-exist on the same cluster and enable fine-grained sharing of individual nodes among different tenants.
Locality Awareness – The next-generation compute platform should support locality awareness; moving computation to the data is a major win for many applications.
High Cluster Utilization – The next-generation compute platform should enable high utilization of the underlying physical resources.
Secure and Auditable Operation – The next-generation compute platform should continue to enable secure and auditable usage of cluster resources.
Reliability and Availability – The next-generation compute platform should have a very reliable user interaction and support high availability.
Support for Programming Model Diversity – The next-generation compute platform should enable diverse programming models and evolve beyond just being MapReduce-centric.
Flexible Resource Model – The next-generation compute platform should enable dynamic resource configurations on individual nodes and a flexible resource model.
Backward Compatibility – The next-generation compute platform should maintain complete backward compatibility of existing MapReduce applications.
Reference – Big Data Analytics Community, Apache Hadoop YARN: Arun, Vinod, Doug, Joseph, Jeff.
Interesting? Please subscribe to our blogs at www.dataottam.com to keep yourself trendy on Big Data & Analytics.
And as always please feel free to suggest or comment email@example.com.