“Comparative Performance Analysis of a Big Data NORA Problem on a Variety of Architectures” Paper

In this paper, Dr. Peter Kogge and David Bayliss of Lexis Nexis Risk Solutions describe  a highly concurrent implementation for Non Obvious Relationship Analysis (NORA). Click the thumbnail image below to download the paper.

ABSTRACT
Non Obvious Relationship Analysis (NORA) is one of the most stressing classes of Big Data Analytics problems. This paper proposes a reference NORA problem that is representative of real problems, and can rationally scale to very large sizes. It then develops a highly concurrent implementation that can run on large systems. Each step of this implementation is sized in terms of how much of four different resources (CPU, memory, disk, and network) might be used. From this, a parameterized model projecting both execution time and utilizations is used to identify the “tall poles” in performance. The parameters are then modified to represent several different target systems, from a large cluster typical of today to variations in an advanced architecture where processing has been moved into memory. A “thought experiment” then uses this model to discover the parameters of a system that would provide both a near 100X speedup, but with a balanced design where no resource is badly over or under utilized.