| ||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
| In contrast to traditional HPC machines and programming models, the Cray XMT would be a more natural platform for solving data-intensive applications that are characterized by dynamically varying computations and low degrees of spatial locality. The Cray XMT relies on the massive multithreading paradigm of programming to exploit concurrency in applications and tackles the memory-latency issue very differently from traditional HPC architectures. The Cray XMT provides the programmer an illusion of a globally addressable flat memory hierarchy, thus avoiding the need for load-balanced data partitioning and redistribution among processors. Instead, it tolerates latency through hardware support for multiple outstanding memory requests, facilitated by fine-grained threads and fast context switches on each processor. The Cray XMT supports lightweight word-level synchronization primitives for minimizing memory contention among threads. The Cray XMT's massive multithreading approach also supports dynamic load balancing and adaptive parallelism, leading to significant benefits in programmer productivity in the design of algorithms for data-intensive applications. Memory locality and computational intensity of the application have little effect on performance, as long as the programmer identifies and exposes sufficient concurrency at a fine granularity. |
||||||||||||||||||||||||||||||||
| Application Case Studies New applications are being created and existing applications are being mapped to the Cray XMT platform with promising results. Preliminary results indicate that the Cray XMT is able to achieve good performance and scalability on challenging, highly irregular applications. |
||||||||||||||||||||||||||||||||
| Application 1: Anomaly Detection for Multivariate Categorical Data (PDTree) The PDTree application originates in the cyber security domain and involves large sets of network traffic data. Analysis is performed to detect anomalies in network traffic packet headers in order to locate and characterize network attacks, and to help predict and mitigate future attacks. The PDTree application is a special case of a more widely applicable analysis method that uses ideas from conditional probability in conjunction with a novel data structure and algorithm to find relationships and patterns in the data. |
||||||||||||||||||||||||||||||||
| When dealing with categorical data in multiple variables we can ask, for any combination of variables and instantiation of values for those variables, how many times this pattern has occurred. Because multiple variables are being considered simultaneously, the resulting count table, or contingency table, specifies a joint distribution. Efficient algorithms using contingency tables are critical for implementation of Bayesian networks, log-linear models, Markov random fields, various graph representations, and novel algebraic-based approaches. This is especially important when there are a large number of variables and observations or when some variables take on many distinct values. All of these hold with regard to the massive amounts of data prevalent in cyber security analysis. | Preliminary results indicate that the Cray XMT is able to achieve good performance and scalability on challenging, highly irregular applications. | |||||||||||||||||||||||||||||||
| The PDTree data structure stores counts for selected combinations of variables in categorical data. The selection of the combinations, specified as a Guide Tree (figure 2, p39), can be derived from a formal statistical model or from requirements regarding memory space. We have designed and implemented a multithreaded, parallel version of the population of a PDTree from a raw, multiple variable, categorical dataset. The implementation utilizes the features found on the Cray XMT to enable the execution of this dynamic, data-dependent application. We used an n-ary tree with a specialized root level to implement concurrent, fine-grained tree node updates and insertions and took advantage of the low-cost atomic operations and full/empty bit synchronization provided by the Cray XMT. | ||||||||||||||||||||||||||||||||
| For our application, an experiment was designed that streamed a large categorical dataset resident on a parallel Lustre file system—from the Opteron processors on the Cray XMT to the Threadstorm processors—in order to be inserted into the PDTree data structure. The dataset contained 64 million records, and we used a guide tree with nine columns. We streamed the data using 250 MB chunks, which corresponds to approximately four million records. Figure 3 presents the performance results achieved in this experiment. The results show the scalability potential of the Cray XMT on an application with highly irregular, data-dependent memory accesses and high degrees of fine-grained parallelism. |
||||||||||||||||||||||||||||||||
| Application 2: Survey Propagation for Complex Systems in Biology Biology recently has taken a sharp turn into complexity, in ways that are relevant to HPC and statistical physics. Message passing between elements is a common theme in many complex systems, whether they are atoms, economic agents, logical variables, or neurons. Each cell in an organism is an intricate chemical machine that is constructed from a massive number of heterogeneous discrete elements organized into multifaceted, multiscale networks. Large amounts of data about these networks are rapidly accumulating due to high-throughput experimental technologies in the new discipline of systems biology. However, making sense of this deluge of complex data is likely to challenge current paradigms and demand real innovation in advanced algorithms and computing systems. |
||||||||||||||||||||||||||||||||
| Recently, tools designed to study spin glasses, as well as the interaction of atomic ensembles, have proven useful in developing more efficient algorithms for solving hard combinatorial problems. This transfer of insight across domains highlights the intriguing similarity between phase transitions in physical systems and sharp thresholds found in computational complexity. Especially interesting is the role of data exchange across probabilistic graphical models, or message passing, as an algorithmic and conceptual framework. | ||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
| Survey Propagation (SP) is a message-passing algorithm that is able to solve especially large, hard Boolean satisfiability problems, up to 10 million variables close to the threshold of solvability. SP is a generalization of (loopy) Belief Propagation that transmits a probabilistic "survey" about the variables along the edges of a bipartite factor graph composed of variable and clause nodes. The algorithm exchanges information along the graph until a consensus emerges in a fixed-point solution. | ||||||||||||||||||||||||||||||||
| We are investigating SP as an efficient solver for use against computationally hard problems in biology; for example, graph-theoretic problems on transcriptional, metabolic, protein-protein interaction and protein-homology networks. This strategy is motivated by the extraordinary diversity of algorithms needed against large-scale data in modern biology. A powerful central engine can simplify the challenge of applying HPC to a complex array of problems. Hundreds of NP-complete problems from graph theory, network design, set and partition, storage and retrieval, sequencing and scheduling, mathematical programming, game theory, logic, automata and language theory, and optimization have been identified (see Further Reading). The problems in this complexity class can be solved by reduction to another form; for example, from k-clique to satisfiability (SAT). | ||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
| Just as cellular complexity is essentially a mystery under intensive investigation, the complexity of these NP-complete problems is a parallel challenge. We have chosen the graph theoretic problem of finding cliques, or fully connected subgraphs, in protein-homology networks as an initial target because we are easily able to interpret results from a biological perspective using a visual analytics toolkit that we have developed. | ||||||||||||||||||||||||||||||||
| Early results have shown some surprises. A simple reduction of a protein-homology network to three-clique unexpectedly provided a bonus by consistently finding much larger cliques. Tuning the properties of these reduction algorithms to a specific purpose will require many computational experiments to define the geometry of their complexity. Thus, an efficient implementation of a general solver such as SP is a workhorse that can produce not only new biological understanding, but theoretical computer science as well. | ||||||||||||||||||||||||||||||||
| We have implemented a parallel version of SP on the Cray XMT and have evaluated its performance in the context of randomly generated Boolean satisfiability problems with up to 10 million variables and 50 million clauses in preparation for understanding its performance for problems derived from biological datasets. Figure 4 presents parallel speedup results for SP on a 64-processor Cray XMT for a problem with one million Boolean variables and five million clauses. Scalability and parallel speedup are almost linear for this very challenging combinatorial problem, which is very difficult to execute at scale on traditional HPC systems. |
Our hope is that the Cray XMT will better meet needs and bring new energy and ideas into the HPC world. | |||||||||||||||||||||||||||||||
| Expanding the Use of HPC The Cray XMT is a disruptive architecture. It supports unprecedented performance for complex, memory-limited applications, and so has the potential to substantially broaden the range of applications benefiting from HPC. A recent workshop sponsored by the DOE Office of Science and the Department of Defense explored possible applications for this machine. In addition to applications in biology and knowledge discovery discussed above, the workshop identified agent-based modeling, cognitive science, combinatorial optimization, and other emerging applications. Scientific communities that depend on these computations have been limited consumers of traditional HPC because their applications do not align well with the capabilities of message-passing clusters. Our hope is that the Cray XMT will better meet their needs and bring new energy and ideas into the HPC world. |
||||||||||||||||||||||||||||||||
| Contributors: Daniel Chavarria-Miranda, Deborah Gracio, Andres Marquez, Jarek Nieplocha, Chad Scherrer, and Heidi Sofia, Pacific Northwest National Laboratory; David A. Bader and Kamesh Madduri, Georgia Institute of Technology; Jonathan Berry and Bruce Hendrickson, Sandia National Laboratories; Kristyn Maschhoff, Cray, Inc. |
||||||||||||||||||||||||||||||||
| Further Reading www.nada.kth.se/~viggo/wwwcompendium |
||||||||||||||||||||||||||||||||
| Published by IOP Publishing in association with Oak Ridge National Laboratory, for the US Department of Energy, Office of Science. Copyright © 2008 by IOP. | ||||||||||||||||||||||||||||||||