SE Practices

The Sustain project is a collaboration between Colorado State University, Arizon State University, University of California — Irvine, and the University of Maryland – Baltimore County. The project includes an advisory board that includes representation from academia, industry, and citizen science. Through discussion and collaboration, Colorado state University has broken ground with the Aperture, Galileo and Synopsis projects at the moment.

Components: Components are being developed for ingesting different types of datasets, providing a metadata catalog, implementing clustering algorithms, creating models, and visualizing and charting the data. These components are being designed and refactored to provide consistent support for arbitrary datasets. These API allows for seamless integration of the different components.

Programming Languages: The components themselves are being developed using multiple languages by selecting the most appropriate one for the task at hand. For example, the web-based front-end is being developed using webpack to bundle the necessary components in JavaScript and capture their dependencies. This enables the software to run on multiple browser platforms without resulting in code bloat. The software code base has been developed in multiple languages – C++, Java, Javascript, Python, and Rust — and includes hundreds of packages with a very large number of classes.

Platforms: The software runs on major operating systems including various flavors of Unix and Linux. The orchestration framework makes use of physical and virtual machines in addition to Kubernetes. Current versions also leverage orchestration frameworks such as YARN and Kubernetes. Preliminary experimentations have also been performed using Podman.

Software development: The team uses rigorous software design, implementation, and testing techniques. The developers are trained in the best practices and principles for writing clean code. We leverage a rich set of open-source libraries for data encoding formats, database connectivity drivers.

Frameworks for unit testing (e.g., JUnit), continuous integration (e.g., Travis CI), mocking (e.g., Mockito) are used. Design and code quality metrics and test coverage data are captured using IntelliJ and CodeClimate. Metrics are used to determine which parts of the code need refactoring to improve the code quality. The software is maintained in a GitHub organization using separate repositories for each component.

Process management: The developers follow an agile, iterative software development process. The entire team meets once every two weeks to discuss updates, roadblocks, and plans. Smaller groups meet using the Scrum approach twice every week. Trello is used to visualize high-level tasks and timelines. ZenHub is used to track issues in the backlog and those in progress. Pull requests must be approved by another developer of the team in GitHub before code modifications can be merged.

User Involvement: Once our community engagement workshops gets underway in May 2021, we will use our interactions with users to add new features, improve the usability and user experience, and finally set up the development site for user contributions.