Climate Prediction "Resolution Independent" Trickle Data Interface for Distributed Computing



Abstract
Current climate prediction programs that are deployed in distributed computing have very high attrition rates (~10% participant loss per ~15% simulation completion). On top of this these programs in their current form have almost non-existent data return. In spite of the advent of using trickle 'work units' to grant participant credit, not much has been achieved with respect to advancing the technology.


The Importance of Trickle Messages in Tightly and Loosely Bound Climate Models

Trickle messages can provide an ongoing subsampling of a climate model's activity.

This running model subsampling can, at a high enough rate -- and with enough different types of uplink messages -- provide a near real time state of the climate model running on a personal computer or cluster server.

Knowing the state of the climate model is as important to end users as it is to those wanting the climate models run.



CESM Climate Simulation Model Diagram, CESM is a
        loosly bound Climate Model


API Scope

Trickle messages, as of this time are specified here
Trickle messages let applications communicate with the server during the execution of a workunit.
These trickle messages are intended for applications that have long workunits (running over multiple days or weeks).



Trickle-up messages
Trickle-up messages go from application to server. They are handled by trickle handler daemons running on the server. Each message is tagged with a 'variety' (a character string). Each daemon handles messages of a particular variety. (This is used, typically, to distinguish different applications.)

Example uses:

Client-side API

To send a trickle-up message, call : int boinc_send_trickle_up(char* variety, char* text)
 
Server-side API

To handle trickle-up messages, use a 'trickle_handler' daemon.
This is a program, based on sched/trickle_handler.cpp, linked with a function

int handle_trickle(MSG_FROM_HOST&);
struct MSG_FROM_HOST { ... };




Overall design goals and principals

Model Data
Why needed Notes
Model date range for trickle (start time, stop time)
Format START: YYYYYY-DDD; STOP: YYYYYY-DDD, forced leading zeros

Overall model stability
Should only be reported fortnightly

Ocean model stability
Not required for slab ocean models, but should be mandatory for non-slab models.
Land model stability
Should only be reported fortnightly

Flux correction stability
Should be reported every 7 days

Convection stability
Not needed in models that have no support for convection, optional

Clausius Clapeyron trends
The C-C equation governs the water-holding capacity of the atmosphere. Ideally, there should be numbers for the Arctic, Mid-Latitudes (N & S hemispheres), Tropics and Antarctic.

CFL Stability
Three mathematicians named Courant, Friedrichs, and Lewy created a criterion that, if violated, would lead to the "blowing up" of a finite-difference weather prediction model. This CFL criterion is: The speed of fastest winds in model must be less than or equal to grid spacing divided by the time step. Update daily.

Gaussian Second Correction Stability
Each model year around 25 April, all climate models should advance the atmosphere 1 second to correct for the effects of the Gaussian Second. This group should only transmitted after the correction and take into account 15 days around the correction to check for the stability of the correction

Model data version numbers
For version tracking, as well as core science code tracking, printable not binary.
There must absolutely be version numbers



Global Data


Albedo (Global, 14 days)
The global Albedo for the entire planet, calculate every other day and send data series with model timestamps.

Albedo (Tuple)
{Global, Mid Latitudes, Arctic; Hemisphere (Northern, Southern); Days_averaged_over, Model_timestamp_at_average_start_day}

Average temperature
Arbitrary
Can include monthly Allan Variances
Average daily rainfall
Arbitrary
Can include monthly Allan Variances
Average daily snowfall
Arbitrary
Can include monthly Allan Variances
Number of High Pressure Cells (+ average size)
Shape and vector categorization is possible for up to 10 cells, with signed 16 bits LAT & LON.

Number of Low Pressure Cells (+ average size)
Shape and vector categorization is possible for up to 10 cells, with signed 16 bits LAT & LON.
Pressure cell deviation
Highest vs Lowest pressure cells, the median & standard deviation units away from each other

Accumulated snowfall : Sea
Sampling should be done with a overlapping 7 days sample time window, with 2 days of overlap.
Can include monthly Allan Variances
Accumulated snowfall : Land
Sampling should be done with a overlapping 7 days sample time window, with 2 days of overlap. Can include monthly Allan Variances
Accumulated rainfall : Sea
Sampling should be done with a overlapping 7 days sample time window, with 2 days of overlap. Can include monthly Allan Variances
Accumulated rainfall : Land
Sampling should be done with a overlapping 7 days sample time window, with 2 days of overlap. Can include monthly Allan Variances
Global data version numbers
For version tracking as well as core science code tracking, each return data type should have a local version number typically an signed integer of the year. The sign should designate what part of the year the code was frozen.
There must absolutely be version numbers



Regional Data


Arctic Albedo
Due to the Arctic and Antarctic being different physical structures, this is needed
Hi, Mid regions
Number of High Pressure Cells (+ average size)
Tropics, Mid Latitudes, Arctic AND Americas, Europe-Africa, Asia, Australasian differential; vector data possible for up to 5 cells in this dataset

Number of Low Pressure Cells (+ average size)
Tropics, Mid Latitudes, Arctic AND Americas, Europe-Africa, Asia, Australasian differential; vector data possible for up to 5 cells in this dataset

Tropical average temperature
Area inside the Tropic of Capricorn & Tropic of Cancer
Can include monthly Allan Variances
Tropical average temperature : LAND
Area inside the Tropic of Capricorn & Tropic of Cancer Can include monthly Allan Variances
Tropical average temperature : SEA
Area inside the Tropic of Capricorn & Tropic of Cancer
Can include monthly Allan Variances
Mid Latitudes average temperature

Can include monthly Allan Variances
Mid Latitudes average temperature : LAND

Can include monthly Allan Variances
Mid Latitudes average temperature : SEA

Can include monthly Allan Variances
Arctic & Antarctic average temperature
Using Arctic & Antarctic Geodesic Circle
Can include monthly Allan Variances
Antarctica differential average temperature
Antarctica is mostly land, not sea as with the Arctic
Can include monthly Allan Variances
Clausius Clapeyron trends (Arctic, Tropics, Mid Latitudes)
The C-C equation governs the water-holding capacity of the atmosphere -- but it may play out differently in different parts of the world due to seasonal variation and model input variation.

Ocean sector level average temperature
North Atlantic (North & South), Caribbean Sea, South Atlantic (North & South), Indian Ocean (North, Central & South), Pacific (North, Tropics & South), Tasman Sea, Mediterranean Sea,  Southern Ocean (surrounds Antarctica) -- FOR NON SLAB OCEAN MODELS

"Cold Equator" trend line
Mainly for slab ocean models, although no model is totally immune from this effect; for monitoring only. If too frigid, abort the model!

Regional data version numbers For version tracking as well as core science code tracking, each return data type should have a local version number typically an signed integer of the year. The sign should designate what part of the year the code was frozen.



National Level Data


SEE APPENDIX A
National level data is vital for the national weather agencies -- and national weather data available may vastly expand funding for this branch of science.

National Level data version numbers
For version tracking as well as core science code tracking, each return data type should have a local version number typically an signed integer of the year. The sign should designate what part of the year the code was frozen.



Optional Data


Pollution dispersion (National or Regional Level)
Useful for Sulphur models, volcanoes and smokestacks are equal here...

Sea Ice granulation or solidification events
Trend data that applies to Antarctic and Arctic regions only, may require some math and climate database reads to find.

Sea Ice extent
Trend data that applies to Antarctic and Arctic regions only, a snapshot number.



Runtime Diagnostic Data


Trickle version number (software + science code)
The trickle software should be upgradeable during a simulation. The climate simulation programs should not even know it is there ... except for some geoid database locking policies no changes needed.

CPU Time (seconds)
For granting credit to users

Average seconds per timestep
For user to track real CPU speed & credit calibration
Can include monthly Allan Variances
Median seconds per timestep
For user to track real CPU speed & credit calibration
Can include monthly Allan Variances
Minor exceptions handled (+ type)
For debugging and program optimization

Start & Stop record
For tracking starts & restarts of the climate model.

Program loop diagnostic data
For beta testing, but all deployed running code should provide some loopback diagnostics.

Hashsums for data groups
Data integrity within the transmitted data. For compactness, MD5 is adequate even though its information theoretical security is weak.



Data formatting issues
The data should be in the XML or XML-DB form. It should be readable, as well as unambiguously structured.

Trickle up issues
It would be advisable to have each trickle be no more than 4 kb in size.

Further reading etc ...


APPENDIX A: National Level, Regional Data



North America
  • 1
  • 2

Greater Europe
  • 1
  • 2

South Asia
  • 1
  • 2


Central America
  • 1
  • 2

Lesser Europe etc ...
  • European exclaves and overseas territories
  • US exclaves and overseas territories

East Asia
  • 1
  • 2


South America
  • 1
  • 2

Africa
  • 1
  • 2

Australasia
  • 1
  • 2


At this time these are suggestions.



APPENDIX B: Prehistoric Oceans & Land masses

There needs to be a long term working group set up to determine best how to upload data for ocean arrangements that date back further than 2 million years. Until the trickle work unit technology is adequately working, this avenue of research should be put on hold. Ideally geographic reporting regions should be set out in 50mya chunks.


APPENDIX B: Outer Planets Climate Simulations

There needs to be a working group set up to define a common message set for this application. Both rocky and gas planets need to have a common answerback API.







Author
Initial idea
Document created

Last modified
Last change

Version

Revision State


Max Power
20 April 2006
15 May 2007
22 May 2014
Add Albedo tracking, remove broken URLs

0.56a

Developmental