Yanwei Zhang, Matthew Wolf, Karsten Schwan, Qing Liu, Greg Eisenhauer,Scott Klasky. Co-Sites: The Autonomous Distributed Dataflows in Collaborative Scientific Discovery, In 10th Workshop on Workflows in Support of Large-Scale Science (WORKS ’15), in conjunction with SC'15, November, 2015.
Xiaocheng Zou, Kesheng Wu, David A. Boyuka II, Daniel F. Martin, Suren Byna, Houjun Tang, Kushal Bansal, Terry J. Ligocki, Hans Johansen,, Nagiza F. Samatova. Parallel In Situ Detection of Connected Components in Adaptive Mesh Refinement Data, In Proc. Cluster, Cloud and Grid Computing (CCGrid), May, 2015.
Adaptive Mesh Refinement (AMR) represents a significant advance for scientific simulation codes, greatly reducing memory and compute requirements by dynamically varying simulation resolution over space and time. As simulation codes transition to AMR, existing analysis algorithms must also make this transition. One such algorithm, connected component detection, is of vital importance in many simulation and analysis contexts, with some simulation codes even relying on parallel, in situ connected component detection for correctness. Yet, current detection algorithms designed for uniform meshes are not applicable to hierarchical, non-uniform AMR, and to the best of our knowledge, AMR connected component detection has not been explored in the literature. Therefore, in this paper, we formally define the general problem of connected component detection for AMR, and present a general solution. Beyond solving the general detection problem, achieving viable in situ detection performance is even more challenging. The core issue is the conflict between the communication-intensive nature of connected component detection (in general, and especially for AMR data) and the requirement that in situ processes incur minimal performance impact on the co-located simulation. We address this challenge by presenting the first connected component detection methodology for structured AMR that is applicable in a parallel, in situ context. Our key strategy is the incorporation of an multi-phase AMR-aware communication pattern that synchronizes connectivity information across the AMR hierarchy. In addition, we distill our methodology to a generic framework within the Chombo AMR infrastructure, making connected component detection services available for many existing applications. We demonstrate our method’s efficacy by showing its ability to detect ice calving events in real time within the real-world BISICLES ice sheet modeling code. Results show up to a 6.8x speedup of our algorithm over the existing specialized BISICLES algorithm. We also show scalability results for our method up to 4,096 cores using a parallel Chombo-based benchmark.
Xiaocheng (Chris) Zou, Suren Byna, Hans Johansen, Daniel Martin, Nagiza F. Samatova, Arie Shoshani, John Wu. Six-fold Speedup of Ice Calving Detection Achieved by AMR-aware Parallel Connected Component Labeling, In SciDAC PI Meeting, July, 2015.
Alexy Agranovsky, David Camp, Christoph Garth, E. Wes Bethel, Kenneth I. Joy, Hank Childs. Improved Post Hoc Flow Analysis vis Lagrangian Representations, In Proceedings of the Large Data Analysis and Visualization Symposium (LDAV), Paris, France, Note: Best Paper Award, pp. 67–75. November, 2014.
H. Bhatia, V. Pascucci, R.M. Kirby, P.-T. Bremer. Extracting Features from Time-Dependent Vector Fields Using Internal Reference Frames, In Computer Graphics Forum (Proceedings of EuroVis), Vol. 33, No. 3, pp. 21--30. June, 2014.
Extracting features from complex, time-dependent flow fields remains a significant challenge despite substantial research efforts, especially because most flow features of interest are defined with respect to a given reference frame. Pathline-based techniques, such as the FTLE field, are complex to implement and resource intensive, whereas scalar transforms, such as λ2, often produce artifacts and require somewhat arbitrary thresholds. Both approaches aim to analyze the flow in a more suitable frame, yet neither technique explicitly constructs one.
This paper introduces a new data-driven technique to compute internal reference frames for large-scale complex flows. More general than uniformly moving frames, these frames can transform unsteady fields, which otherwise require substantial processing of resources, into a sequence of individual snapshots that can be analyzed using the large body of steady-flow analysis techniques. Our approach is simple, theoretically well-founded, and uses an embarrassingly parallel algorithm for structured as well as unstructured data. Using several case studies from fluid flow and turbulent combustion, we demonstrate that internal frames are distinguished, result in temporally coherent structures, and can extract well-known as well as notoriously elusive features one snapshot at a time.
Scientific Visualization: Uncertainty, Multifield, Biomedical, and Scalable Visualization, Ch. 1, Edited by M. Chen and H. Hagen and C.D. Hansen and C.R. Johnson and A. Kauffman, Springer-Verlag, pp. 3--27. 2014.G.P. Bonneau, H.C. Hege, C.R. Johnson, M.M. Oliveira, K. Potter, P. Rheingans, T. Schultz. Overview and State-of-the-Art of Uncertainty Visualization, In
The goal of visualization is to effectively and accurately communicate data. Visualization research has often overlooked the errors and uncertainty which accompany the scientific process and describe key characteristics used to fully understand the data. The lack of these representations can be attributed, in part, to the inherent difficulty in defining, characterizing, and controlling this uncertainty, and in part, to the difficulty in including additional visual metaphors in a well designed, potent display. However, the exclusion of this information cripples the use of visualization as a decision making tool due to the fact that the display is no longer a true representation of the data. This systematic omission of uncertainty commands fundamental research within the visualization community to address, integrate, and expect uncertainty information. In this chapter, we outline sources and models of uncertainty, give an overview of the state-of-the-art, provide general guidelines, outline small exemplary applications, and finally, discuss open problems in uncertainty visualization.
David A. Boyuka II, Sriram Lakshminarasimhan, Xiaocheng Zou, Zhenhuan Gong, John Jenkins, Eric R. Schendel, Norbert Podhorszki, Qing Liu, Scott Klasky,, Nagiza F. Samatova.. Transparent in situ data transformations in ADIOS, In Proc. Cluster, Cloud and Grid Computing (CCGrid), May, 2014.
P.-T. Bremer, I. Hotz, V. Pascucci, R. Peikert.
Topological Methods in Data Analysis and Visualization III, Mathematics and Visualization, 2014.
H. Bui, V. Vishwanath, H. Finkel, K. Harms, J. Leigh, S. Habib, K. Heitmann, M. E. Papka. Scalable parallel I/O on Blue Gene/Q supercomputer using compression, topology-aware data aggregation, and subfiling, In Proceedings of the 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2014), Turin, Italy, February, 2014.
H. Bui, E.S. Jung, V. Vishwanath, J. Leigh, M. Papka. Improving Data Movement Performance for Sparse Data Patterns on Blue Gene/Q Supercomputer, In 7th International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2) held in conjunction with the 43rd International Conference on Parallel Processing, Minneapolis, Minnesota, USA, September, 2014.
Large scale scientific simulations frequently use streamline based techniques to visualize flow fields. As the shape of a streamline is often related to some underlying property of the field, it is important to identify streamlines (or their parts) with unique geometric features. In this paper, we introduce a metric, called the box counting ratio, which measures the geometric complexity of streamlines by measuring their space-filling capacity at different scales.We propose a novel interactive visualization framework which utilizes this metric to extract, organize and visualize features of varying density and complexity hidden in large numbers of streamlines. The proposed framework extracts complex regions of varying density from the streamlines, and organizes and presents them on an interactive 2D information space, allowing user selection and visualization of streamlines. We also extend this framework to support exploration using an ensemble of measures including box counting ratio. Our framework allows the user to easily visualize and interact with features otherwise hidden in large vector field data. We strengthen our claims with case studies using combustion and climate simulation data sets.
A. Chaudhuri, T.-H. Wei, T.-Y. Lee, H.-W. Shen, T. Peterka. Efficient Range Distribution Query for Visualizing Scientific Data, In Proceedings of the 2014 IEEE Pacific Visualization Symposium (PacificVis), 2014.
B. Chapman, H. Calandra, S. Crivelli, J. Dongarra, J. Hittinger, C.R. Johnson, S.A. Lathrop, V. Sarkar, E. Stahlberg, J.S. Vetter, D. Williams. ASCAC Workforce Subcommittee Letter, Subtitled DOE ASCAC Committee Report, 2014.
Simulation and computing are essential to much of the research conducted at the DOE national laboratories. Experts in the ASCR-relevant Computing Sciences, which encompass a range of disciplines including Computer Science, Applied Mathematics, Statistics and domain sciences, are an essential element of the workforce in nearly all of the DOE national laboratories. This report seeks to identify the gaps and challenges facing DOE with respect to this workforce.
The DOE laboratories provided the committee with information on disciplines in which they experienced workforce gaps. For the larger laboratories, the majority of the cited workforce gaps were in the Computing Sciences. Since this category spans multiple disciplines, it was difficult to obtain comprehensive information on workforce gaps in the available timeframe. Nevertheless, five multi-purpose laboratories provided additional relevant data on recent hiring and retention.
Data on academic coursework was reviewed. Studies on multidisciplinary education in Computational Science and Engineering (CS&E) revealed that, while the number of CS&E courses offered is growing, the overall availability is low and the coursework fails to provide skills for applying CS&E to real-world applications. The number of graduates in different fields within Computer Science (CS) and Computer Engineering (CE) was also reviewed, which confirmed that specialization in DOE areas of interest is less common than in many other areas.
Projections of industry needs and employment figures (mostly for CS and CE) were examined. They indicate a high and increasing demand for graduates in all areas of computing, with little unemployment. This situation will be exacerbated by large numbers of retirees in the coming decade. Further, relatively few US students study toward higher degrees in the Computing Sciences, and those who do are predominantly white and male. As a result of this demographic imbalance, foreign nationals are an increasing fraction of the graduate population and we fail to benefit from including women and underrepresented minorities.
There is already a program that supports graduate education that is tailored to the needs of the DOE laboratories. The Computational Science Graduate Fellowship (CSGF) enables graduates to pursue a multidisciplinary program of education that is coupled with practical experience at the laboratories. It has been demonstrated to be highly effective in both its educational goals and in its ability to supply talent to the laboratories. However, its current size and scope are too limited to solve the workforce problems identified. The committee felt strongly that this proven program should be extended to increase its ability to support the DOE mission.
Since no single program can eliminate the workforce gap, existing recruitment efforts by the laboratories were examined. It was found that the laboratories already make considerable effort to recruit in this area. Although some challenges, such as the inability to match industry compensation, cannot be directly addressed, DOE could develop a roadmap to increase the impact of individual laboratory efforts, to enhance the suitability of existing educational opportunities, to increase the attractiveness of the laboratories, and to attract and sustain a full spectrum of human talent, which includes women and underrepresented minorities.
Zhengzhang Chen, Seung Woo Son, William Hendrix, Ankit Agrawal, Wei-keng Liao, Alok Choudhary. NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing, In the International Conference for High Performance Computing, Networking, Storage and Analysis, November, 2014.
Hsuan-Te Chiu, Jerry Chou, Venkat Vishwanath, Suren Byna,, Kesheng Wu,. Simplifying Index File Structure to Improve I/O Performance of Parallel Indexing, In The 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2014), 2014.
Hank Childs, Scott Biersdorff, David Poliakoff, David Camp, Allen D. Malony. Particle Advection Performance Over Varied Architectures and Workloads, In 21th Annual International Conference on High Performance Computing, HiPC 2014, goa, india, dec, 2014.
Dong Dai, Robert B. Ross, Philip Carns, Dries Kimpe, Yong Chen. Using Property Graphs for Rich Metadata Management in HPC Systems, In Proceedings of the 9th Parallel Data Storage Workshop, IEEE, 11, 2014.
Ciprian Docan, Fan Zhang, Tong Jin, Hoang Bui, Qian Sun, Julian Cummings, Norbert Podhorszki, Scott Klasky, Manish Parashar. ActiveSpaces: Exploring dynamic code deployment for extreme scale data processing, In Concurrency and Computation: Practice and Experience, 2014.
Managing the large volumes of data produced by emerging scientific and engineering simulations running on leadership-class resources has become a critical challenge. The data have to be extracted off the computing nodes and transported to consumer nodes so that it can be processed, analyzed, visualized, archived, and so on. Several recent research efforts have addressed data-related challenges at different levels. One attractive approach is to offload expensive input/output operations to a smaller set of dedicated computing nodes known as a staging area. However, even using this approach, the data still have to be moved from the staging area to consumer nodes for processing, which continues to be a bottleneck. In this paper, we investigate an alternate approach, namely moving the data-processing code to the staging area instead of moving the data to the data-processing code. Specifically, we describe the ActiveSpaces framework, which provides (1) programming support for defining the data-processing routines to be downloaded to the staging area and (2) runtime mechanisms for transporting codes associated with these routines to the staging area, executing the routines on the nodes that are part of the staging area, and returning the results. We also present an experimental performance evaluation of ActiveSpaces using applications running on the Cray XT5 at Oak Ridge National Laboratory. Finally, we use a coupled fusion application workflow to explore the trade-offs between transporting data and transporting the code required for data processing during coupling, and we characterize sweet spots for each option.
Steffen Frey, Filip Sadlo, Kwan-Liu Ma,, Thomas Ertl. Interactive Progressive Visualization with Space-Time Error Control, In Proceedings of IEEE SciVis 2014 (also IEEE TVCG 20(12)), November, 2014.
M.G. Genton, C.R. Johnson, K. Potter, G. Stenchikov, Y. Sun. Surface boxplots, In Stat Journal, Vol. 3, No. 1, pp. 1--11. 2014.
In this paper, we introduce a surface boxplot as a tool for visualization and exploratory analysis of samples of images. First, we use the notion of volume depth to order the images viewed as surfaces. In particular, we define the median image. We use an exact and fast algorithm for the ranking of the images. This allows us to detect potential outlying images that often contain interesting features not present in most of the images. Second, we build a graphical tool to visualize the surface boxplot and its various characteristics. A graph and histogram of the volume depth values allow us to identify images of interest. The code is available in the supporting information of this paper. We apply our surface boxplot to a sample of brain images and to a sample of climate model outputs.