Different Types of Data in Hadoop File System
Here let us see what kind of data organizations wants to ingest into Hadoop for their Business or Analytics Insights. Basically Large volume of data and unstructured data are strong candidates for Hadoop.
Clickstream data :
Clickstream data is the stream of clicks someone performs when visiting a website. This information can be used for path optimization, next product to buy analysis, customer segmentation, and gaining insight into customers in general. Clickstream is used to measure visitors’ activity and their behavior in the website. Organizations want to be able to quickly identify the intent of website visitors and dynamically adjust the interfaces to influence and trigger customer actions. Clickstream data is key to digital personalization across different channels (smartphones, computers, tablets, email, catalog, TV, radio, smoke signals, and so on).
Sentiment Data :
Sentiment data from social media, combined with text from server logs and customer data, can be used to understand how to influence behavior on a website. Online retailers can reduce bounce rates and improve sales. Sentiment data enables organizations to respond to user sentiments, positive and negative, from their customers and their competitors. On social media, customers also discuss what products they would like to have as well as products they don’t like. This information is important for measuring brand loyalty and brand recognition.
Sensor/Machine data :
Sensor/machine information is detailed information on parts and components and environmental conditions with these parts. This information can be used to improve design, reduce failures, and provide notice when components may be wearing out and about to break. Refrigerators, cars, jet engines, washing machines, soda fountain machines, and just about anything that has components that can break can have their sensor data analyzed for predictive maintenance.
Geospatial data :
Geographic data from global positioning systems, radio-frequency identification (RFID), cellular phones, and tablets all provide information on the locations of people and objects, where they are moving, what time they are moving, and the volume of the movement. This can be used to design highways, off ramps, stop signs, neighborhoods, cities, and rapid transit systems. This information can be used to determine the best routes and which drivers have the most unsafe or safe behavior. Geographic information can be used to determine where to put cell towers, which customers to give bandwidth during high activity, what billboards to dynamically put up on a highway, or how much to charge someone for car insurance. Software such as OnStar tracks the actions of drivers and where they are. OnStar is used for automatic crash response, stolen vehicles, vehicle diagnostics, and the like. This information can then be used to identify patterns in these events. Stores are also looking at using Wi-Fi in cell phones to track someone’s movement in a store to improve store layouts. RFID is being used in hospitals, manufacturing plants, offices, and shopping malls to optimize layout designs.
Log data :
Server logs collect detailed information about customers logging in to websites. It might be important to see when people log in, where they log in from, and how long they stay logged in, and what external events, such as sales advertising, commercials, radio advertisement, weather, holidays, and other activities, impact when someone logs in to a website. Monitoring and analyzing server logs is one of the first steps in security forensics. Server logs can also be mined to determine patterns that identify future component failures from server machines or software environments. Server logs can help identify patterns that identify the beginning of a problem before the problem creates downtime.
Unstructured Data :
Unstructured data (text, video, pictures, and so on) can track activities and behaviors and identify patterns. Newer security methods are tracking patterns on video to identify activities that are likely to happen. This can be used to anticipate criminal activity, traffic issues, violence patterns, or movement in individuals to identify responses to visual cues. Facial recognition in malls, airports, and city streets can be used to find lost children, repeat customers, and potential criminal movement.
Source : Virtualizing Hadoop by George J, Charles Kim, Steven Jones