NASA’s artificial intelligence (AI) capabilities through joint efforts with NASA’s FDL
As the volume of data in the EOSDIS archive continues to grow, the EOSDIS archive’s data ingest rate is expected to increase dramatically along with it. By 2030, the volume of data in the EOSDIS archive is expected to surpass 320 PB.
For the Earthdata Cloud to meet users’ needs, NASA's Earth Science Data and Information System (ESDIS) Project is working to ensure it provides services in several key areas, including:
Data acquisition from data providers (such as NASA science teams)
Data ingest: The system must support multi-mission and multi-discipline data ingest
Data validation and processing
Data archive: The system must preserve and protect NASA Earth observation data
Data distribution, including disaster recovery: The system must support distribution of data, subsetting, and visualization, and must be adaptable to future technologies
Metadata: The harvest, creation, and publication of dataset metadata to the CMR
Data management: The system must meet the development and execution of information lifecycle needs of NASA mission-based Earth science datasets
Metrics: Publication of metrics to the ESDIS Metrics System (EMS), which collects and organizes various metrics from the DAACs and other data providers
NASA’s agreement with AWS has resulted in collaborations to improve the discovery, access, and use of NASA science datasets; the creation of data storage and staging areas to facilitate the community evaluation of data products; and workshops to expand the use of cloud-computing resources
NASA’s collaboration with Google has led to investigations into the transfer, storage, and value of making large volumes of NASA science datasets available on the Google Cloud and Google Earth Engine; making NASA Earth Science data accessible to users via the Google Cloud Public Dataset search engine and Earth Engine Catalog; and growing NASA’s artificial intelligence (AI) capabilities through joint efforts with NASA’s Frontier Development Lab (FDL) Challenges and SpaceML projects
NASA’s partnership with Microsoft has launched investigations into the value of making high-value NASA science datasets available on Azure; cost and performance evaluations of data storage methods and technologies and support analytics; the exploration of strategies to enable cloud-based analytics to promote science in the cloud; and analysis of the approaches to build and share training datasets for AI at a scale
For example, NASA implemented Cumulus, which provides a range of functionality in the cloud, including data acquisition from providers (such as NASA science teams); data ingest, including validation and processing; the harvest, creation, and publication of dataset metadata to the CMR; the storage and distribution of data, including disaster recovery; and publication of metrics to the EMS, which collects and organizes various metrics from the DAACs and other data providers.
Further, Cumulus is integrated with the NASA-Compliant General Application Platform (NGAP), a custom-built cloud optimized platform, which provides highly flexible cloud native infrastructure, NASA-compliant IT Security controls, networking services, and business cost control in Amazon Web Services (AWS).
Moving the collective data archive from the DAACs into the cloud puts NASA Earth observation data “close to compute,” giving users improved access to data, the ability to use large datasets more efficiently, and the ability to conduct a broader range of research. This move will not change existing methods of user interaction with EOSDIS data, but it does require new methods of accessing NASA data that differs from on-premises platforms. Further, as more datasets migrate to the cloud, the DAACs will continue to serve as the gateways to EOSDIS data holdings and provide a wide range of support services for users.