Big Data Splunk’s Best & Better Practices !
Big Data Splunk’s Best & Better Practices !
|Introduction to Splunk|
We see servers, devices, apps, logs, traffic, and clouds. We see data, big data, and fat data everywhere. Splunk offers the leading platform for Operational Intelligence. It enables the curious to look closely at what others ignore which is called machine data and find what others never see. Data Insights that can help make our company more productive, profitable, and competitive and secure.
- Machine-generated data is one of the fastest growing and complex areas of big data and IoT.
- It’s also one of the most valuable, containing a definitive record of all user transactions, customer behaviour, machine behaviour, security threats, fraudulent activity and more.
- Splunk turns machine data into valuable insights no matter what business we are in; which is called Operational Intelligence.
- Operational Intelligence gives us a real-time understanding of what’s happening across our IT systems and technology infrastructure so we can make informed decisions; which is enabled by the Splunk Platform.
- Splunk reduces mean-time-to-resolution (MTTR) with rapid, data-driven troubleshooting. Proactively monitor infrastructure by correlating events across a variety of big data sources like GPS, RFID, Hypervisor, Web Servers, Email, Messaging, Clickstreams, Mobile, Telephones, IVR, Databases, Sensors, Telematics, Storage, Servers, and Security Devices.
- The mean-time-to-investigate (MTTI) is now faster than ever before.
|Splunk Deployment Scaling in 4 Steps|
Deployment Scaling is vital and important consideration for any Splunk deployments. Below are the four different
types of deployment from zero to awesome big data / IoT applications deployment.
- Trial – It’s for initial users
- Workgroup – This deployment goes with specific use cases and specific users
- Expansions – This deployment clusters with more sites, more geography, more data sources, more data volume, more data velocity, and more data variability.
- Enterprise Deployment – This goes with enterprise standard, large number of users, many variety of use cases, and many different users.
|Splunk Deployment Methodology|
Below six methodologies provides organized way to implement Splunk in and enterprise; which will help to identify the right people and right resources required.
- Infrastructure – This step enables and ensures us the Splunk has proper defined resources like sizing, servers, topology, disk, security and retention.
- Collection – This step helps Splunkers to ingest correct data into Splunk
- Comprehension – This will enable us to tell Splunk about the big data and other data too
- Querying – This is vital and important step which is to get insights and information out of Splunk
- Integration – It provides search results to a wider audience
- Operations – It keeps always operating, grow it, scale it and evolve it.
|Splunk Deployment Plan|
The Splunk deployment plan should various steps like deployment goals, user roles, staffing list, as-is physical environment, as-is logging details, splunk deployment topology, data source inventory, data policy inventory, data policy definition, splunk apps, report detailed definitions, and deployment schedule.
Current IT Environment should be analysed with details like overall current IT topology like data centres, network zone, number and type of servers, network diagram, and the authentication details.
Current logging details like where the logs are generated, SAN, NAS, syslog, syslog-ng, rsyslog, Kiwi, Snare, and more details what log parsing tools are used with ticketing details.
We need to take care of non-functional requirements like Security, Regulatory, High Availability, and Disaster Recovery with data replication details.
Then more details of the data sources like data source inventory, data policy, handout with data inventory details like data source, ownership, file name, server locations, directory, API, port details, data volume, legal retention, source format, time stamp, collection method, visibility, and compression needs to captured.
|Splunk Best Practices|
Below are the best practices in Splunk deployment methodologies.
- Splunk Indexing review will help in optimizing for quick index
- Optimize for persisting unstructured data
- Schema should be minimal for persisted data
- Persist in its raw form
- Splunk uses the inverted index structure for the .tsidx files
- Splunk indexers provides two types of files like raw data, index files(.tsidx)
- Leverage Inverted index *.tsidx, meta data *.data, and bloom filters which will be very fast
- Manage disk usage with *.tsidx reduction like enable tsidx reduction in inexes.conf
- Set relevantly by using timePeriodInSecBeforeTsidxReduction = 7776000
- Replace few index files with mini-index files
- Remove the merged_lexicon.lex file
- Careful estimates on Indexing volume
- Choose appropriate Splunk Apps from the splunkbase.com
- While apps can always later better plan it up front
- Apps must be integrated with the overall index and inputs
- Low latency network with 1 GB is the minimum band width
- Solid enterprise wide time infrastructure, DNS, Turn off Transparent Hug Pages(THP)
- Review all the docs for each applicable app, for example App for enterprise security places a heavy load on search heads and App for VMWare requires heavy forwarder for data collection
- Dedicated Splunk App for IT Service Intelligence (ITSI) will needs dedicated search head cluster
- Splunk UBA requires special H/W with 50GB for Installation, 500GB for Metadata, 16 CPU cores, and 64 GB RAM
- Each Enterprise Splunk instances requires the following Splunk Enterprise (Indexer/Search Peer, Search Head, Deployment Server, License Master, Heavy Forwarder, Cluster Master, Search Head Master, and Deployer ) and Universal Forwarders (Deployment Client)
- Use Splunk’s built-in forwarder load balancing and distributed search when we are using multiple indexers.
- When in doubt better first thumb rule of scaling is to add another commodity indexer
- Configure multiple pipeline sets instead of single pipe line set to increase hardware utilizations.
- To get the most Input/Output Operations Per Second (IOPS) choose the drives with high rotational speeds, low average latency and seek times.
- Should not use RAID 5, since the duplicate writes make it too slow for high performance indexing and the recommended RAID setup is RAID 10(1+0)
- SAN and NAS are not suitable for hot and warm buckets (db)
- SAN and NAS are suitable for cold buckets (colddb)
- Planning on clustering is vital and we can mix clustered and non-clustered servers
- Have Splunk indexer monitor the log collector as local directory over the network
- Don’t scale beyond single indexer which is not recommended for Network inputs
- Universal forwarder installed on production servers will bring load balancing, buffering, and better performance
- Take advantage of syslog-ng and rsyslog filtering features
- Installed forwarder on syslog-ng / rsyslog server provides greater resiliency and increase scalability
- Local Forwarder deployment will be easier if we have control of base OS build or many data sources.
- Remote Collector (WMI) is only good for limited set of data
- UF and HTTP Event Collector are best Remote Collection Options
- Heavy Forwarders are good Remote Collection Options
- FTP and Batch Scripts are not recommended Remote Collection Options
- Log the human readable format in Key-Value and TextFormat
- Collect events at multiple levels like Infra status, App Status, and Semantic events
- Splunk works better with smaller log files and use blacklist to eliminate compressed files from monitor input
- Do not store configuration in $SPLUNK_HOME/etc/system/local on clients
- Prioritize servers that need more frequent updates using deployment.conf
- Use own server for more than 50 deployment clients
- Use in combination with other configuration management tool doe rare tasks, deployment server for routine tasks
- Six things like timestamp, event boundary, host, source type, source, and index must be get corrected in index time
- Metadata cannot be changed after the events are written to the index
- Use the four built-in lookup types file-based, KV Store, External, and Geospatial and Splunk DBConnect will be best fit for lookup data in a relational databases
- Prefer to use Splunk Common Information Model (CIM), will integrate more easily
- Use appropriate fields and knowledge objects to enrich the data
- Ground rule Splunk uses the MapReduce technology component for its’ Splunk search
- Don’t miss that Indexers can perform event retrieval and additional steps in parallel
- Dense & Sparse search will impact CPU bound
- Super-Sparse and Rare search will impact Primarily I/O bound
- Make sure disk I/O is as good as you can get
- Increase CPU hardware only if needed
- Always slow search should be diagnosis by looking resource consumptions on both indexer tier and the search head tier.
- If we have unused CPU or memory resources, then set multiple search pipeline by using parameter batch_search_max_pipeline = 2 in limits.conf
- Use 11 default concurrent scheduled searches on a 16CPU and calculate the matrix accordingly in the limits.conf
- Use data model acceleration which will speeds up reporting for the specific set of attributes that we define in a data model
- Summary index configuration must be defined in indexes.conf which is just like an regular index
- Don’t use license for summary indexing which is free of cost
- Summary indexes are normally small but allocate space on the indexer tier
- Use Representational Sate Transfer (REST) API via HTTP request to access Splunk
- Always use scheduled searches to insert events into other data systems
- Use Splunk Analytics for Hadoop connect to get insights from Hadoop or NoSQL data store.
- Hunk is now Splunk Analytics for Hadoop
- Use Hadoop Data Roll to archive indexed data into HDFS or S3
- Use the _introspection index populated by introspection_generator_addon in Distributed Management Console (DMC)
- Use Splunk on Splun (S.o.S) monitoring tool for trouble shooting in need basis
- Prefer DMC when possible over S.o.S
- Leverage data backup strategy based on tolerance for data loss and recovery effort
- Always remember to take back up of configuration files across the environment including search heads, indexers, cluster master, deployer, deployment server and any other splunk instance
To conclude we are working with the world of technology, there are hundreds of thousands of different applications, all and it will be usually logging in different formats. As a Splunk expert, our job is to make all those logs speak human, which is often the impossible task earlier but with help of Splunk platform it is possible.
Splunk Best Practices
Interesting ? – Please subscribe to our blogs at www.dataottam.com to keep yourself trendy on Big Data, Analytics, and IoT.
Let’s have coffee firstname.lastname@example.org !