Pioneering Agile Array Analytics

By [email protected] - 1st August 2016 - 16:57

In our quest for disruptive big data technology we crossed the Atlantic to find German-based rasdaman, which truly has the potential for becoming a game changer right tomorrow. rasdaman´s name stands for “raster data manager”.

Peter Baumann, inventor CEO of the company, early on has recognized a gap in big data analytics: missing support for massive matrices and datacubes, also known as multi-dimensional arrays. We find these arrays everywhere: in business, such as stock risk analysis and OLAP; in Life Science; in exploration data; and in industrial simulation. Even analyzing large graphs, like the Facebook One, can be done through array operations.

rasdaman´s core idea is as simple as compelling: combine the flexibility of SQL with the power of array manipulation. The Big Data engine can handle really large volumes quickly, and can combine data sources distributed on planetary scale. We found installations exceeding 130 Terabyte at publicly accessible services, and researchers at superscale data centers, like the European Space Agency and the European Centre for Medium-Range Weather Forecast with its 87 Petabyte climate archive, are feeding rasdaman to go beyond the Petabyte frontier.

With rasdaman’s unique adaptive data partitioning and parallelization, data cubes are analyzed and combined in a straightforward and ultrafast manner. There is a series of strong optimizations which altogether make rasdaman fast. Adaptive data partitioning and distribution is one element, augmented with effective compression on dense and sparse datacubes; intelligent processing utilizes all silicon it can get hold of, within and across nodes and even data centers while respecting security. Query optimization and parallelization is done individually for each incoming query, as opposed to static parallelization like in Spark. And confronts us with a demo where we see a Terabyte analyzed in less than 100 milliseconds.

