Automatic Load Balancing
In today's data centers, it is not unusual to find large clusters of Linux computers totaling in the100's to even 1000's. Scaling the hardware is relatively simple, but scaling software across this hardware infrastructure can be extremely challenging. One of the critical factors that limit scalability of parallel processing systems is load balancing – the ability to ensure all nodes are performing an equal amount of work. If the load is unbalanced, a few nodes end up performing all the work while the rest remain idle. This is especially true for large parallel databases, where the load distribution for a particular SQL query depends on the profile of the data. The profiles of intermediate data at stages within a query execution plan are largely unknown to the database software. Therefore the database engine makes some rough guesses and tries to implement load balancing across nodes in a primitive manner. More often than not, these attempts fail, resulting in highly unbalanced load distribution and poor performance.
XtremeData implements automatic load balancing by collecting detailed statistics in real-time on all data being processed, and then uses these statistics to dynamically distribute the workload. This is a unique strength of our technology and enables dbX to scale without side effects; providing predictable and consistent performance at all scales.