Performance Testing
Performance Testing for Big Data includes two main actions:
Data ingestion and Throughout: In this stage, the tester verifies
how the fast system can consume data from various data source. Testing involves
identifying different message that the queue can process in a given time frame. It
also includes how quickly data can be inserted into underlying data store for
example insertion rate into a Mongo and Cassandra database.
Data Processing: It involves verifying the speed with which the
queries or map reduce jobs are executed. It also includes testing the data
processing in isolation when the underlying data store is populated within the data
sets. For example running Map Reduce jobs on the underlying HDFS
Sub-Component Performance: These systems are made up of multiple
components, and it is essential to test each of these components in isolation. For
example, how quickly message is indexed and consumed, map reduce jobs, query
performance, search, etc.
Performance Testing Approach
Performance testing for big data application involves testing of huge volumes of
structured and unstructured data and it requires a specific testing approach to test
such massive data.
Performance Testing is executed in this order
- Process begins with the setting of the Big data cluster which is to be tested for
performance
- Identify and design corresponding workloads
- Prepare individual clients (Custom Scripts are created)
- Execute the test and analyzes the result (If objectives are not met then tune the
component and re-execute)
- Optimum Configuration
Parameters for Performance Testing
Various parameters to be verified for performance testing are
- Data Storage: How data is stored in different nodes
- Commit logs: How large the commit log is allowed to grow
- Concurrency: How many threads can perform write and read operation
- Caching: Tune the cache setting "row cache" and "key cache."
- Timeouts: Values for connection timeout, query timeout, etc.
- JVM Parameters: Heap size, GC collection algorithms, etc.
- Map reduce performance: Sorts, merge, etc.
- Message queue: Message rate, size, etc.