Artificial Intelligence Training
Graph databases have become critical to AI model training and Lucata provides the high performance graph processing necessary to support AI model training on massive graph datasets.
Incredible progress has been made in AI in the past decade but the traditional hardware these developments rely on are reaching their limits. Compute power used for notable AI achievements has doubled every 3.4 months between 2012 and 2020 according to a study by OpenAI. However, according to OurWorldinData, traditional single core server performance was improving 52% per year in 1986-2003, 23% per year in 2003-2011 and just 7% per year or less since 2011. This decline in improvement stems from traditional architectures reaching the fundamental limits of Moore’s law.
This dramatic, and insurmountable contrast between the progress of traditional architectures and the compute power needed by AI models requires a new era of hardware that overcomes the limitations of conventional computing.
The Downfalls of Traditional Architectures
As a simple example, we will compare the performance of random forest queries on traditional architectures versus Lucata.
Random Forest is highly parallelizable since each decision tree processes every sample independently. The only synchronization occurs when the results of all the decision trees are combined to provide a final classification for a sample. However, it is challenging to apply GPU hardware acceleration when the decision trees within the forest vary significantly in terms of shape and depth. This variability makes pipelining and traditional parallelization techniques difficult because the time to process a sample is data dependent. Additionally, this irregularity in tree size and shape makes it difficult to provide deterministic memory access into the tree. The presence of very deep trees within the forest makes it prohibitively expensive to apply techniques that improve regularity, such as fully populating all trees so the processing time for each sample is identical. The main limitation of Random Forest is that a large number of trees can make the algorithm too slow and ineffective for real-time predictions. In general, these algorithms are fast to train but quite slow to create predictions in a conventional system.
The Advantages of Lucata
The Lucata architecture does not suffer from the limitations of conventional CPU and GPU architectures. The fine-grained memory access and parallelism enables a more accurate prediction based on more trees. Splitting hundreds of trees across hundreds of Lucata compute elements enables the use of Random Forest in situations where run-time performance is important. Use of Random Forest in time critical situations means a straightforward, fairly easy to understand algorithm, which also is not prone to the machine learning problem of overfitting, opens news opportunities in many production market segments. The dynamic parallelism enabled by the dynamic spawning of threads on the Lucata architecture matches the dynamic nature of the machine learning training process. Our ability to run many threads simultaneously also lets us exploit all the different levels of the parallelism inherent to random forest training:
- Multiple trees in parallel
- Computing the Gini values for each candidate feature at every split
- Testing the different thresholds for a given candidate at every split
When training, the data is accessed in a very dynamic, sparse, irregular pattern, which fits our Migratory Threads architecture approach. With the help of our proprietary remote and atomic instructions managed by the hardware, it is easy to write an efficient implementation without compromising the accuracy. When classifying, if using only a single sample, all trees will be used in parallel. If using multiple samples, the Lucata design enables you to use parallelism across all the input samples.
Lucata Unlocks New Possibilities for AI
Traditional architectures are seeing declines in performance. Lucata expects up to 40X performance gains in the next three years. Lucata is already more performant and is not limited by Moore’s law for future advancement. Because of this, while traditional architectures are reaching their limits, Lucata is just hitting its stride. Lucata’s innovative architecture has the inherent ability to improve any graph-based AI. Reach out to our team to inquire about your AI application.
Lucata Enables AI Training at Massive Scale
The next generation performance you need to train AI models efficiently.