Informing the Prediction of Compression Method and Level for Climate Model Data Using Variable Features
Increased computing power makes it possible to simulate larger Earth system model ensembles with higher output frequency, finer spatial resolution, and extended simulation length. These improvements produce massive datasets and are straining institutional storage resources. Therefore, different compression methodologies have been studied to address this issue. It is possible to implement lossless compression methods, where the original data is perfectly preserved. However, lossy compression methods, where part of original data may not be preserved, are a more promising option due to the higher compression rates they can achieve. Previous work has demonstrated that using a combination of different lossy compression methods and levels produces better results overall because the choice of method and level can be tailored to the characteristics of each variable. Currently, determining the optimal compression method and level for each variable is computationally expensive because it involves compressing and reconstructing each variable exhaustively for each possible compression method and level. The optimal combination is then determined by assessing which method/level produces the highest data compression while still satisfying the quality criteria. The goal of this project is to streamline this process by characterizing the variables through features that will be used in a regression model to predict the optimal compression level automatically. We analyze a large ensemble of annual averages of 198 variables from the Community Earth System Model (CESM) with the final goal of informing a multinomial regression model to predict different compression levels for the fpzip compression method. Here we describe and summarize the different features that range from simple statistics to smoothness and clustering indicators, analyze their variability across ensemble members, and preliminarily evaluate their correlation with the different compression levels from fpzip.
document
http://n2t.net/ark:/85065/d7c82csx
eng
geoscientificInformation
Text
publication
2016-01-01T00:00:00Z
EARTH SCIENCE SERVICES > DATA MANAGEMENT/DATA HANDLING > DATA COMPRESSION
EARTH SCIENCE SERVICES > MODELS > COUPLED CLIMATE MODELS
revision
2021-09-17
publication
2017-09-01T00:00:00Z
Copyright Author(s). This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
None
OpenSky Support
UCAR/NCAR - Library
PO Box 3000
Boulder
80307-3000
name: homepage
pointOfContact
OpenSky Support
UCAR/NCAR - Library
PO Box 3000
Boulder
80307-3000
name: homepage
pointOfContact
2023-08-18T18:06:47.944050