Seamless integration of data and data checks

We are excited to introduce how to use data checks via our api services!

Several data checks are now available for archived and real-time data. Additional data checks are being developed and will be implemented after further testing.

But first, a big caveat. All data accessible via the api services must be considered provisional. There are many potential causes for erroneous data: sensor failure, transmission errors, or incorrect conversion of the transmitted data arising from invalid metadata. The data checks are intended to help you identify: (1) physically implausible values and (2) values that may not be representative of the conditions prevailing at that time.

Second, here’s another big caveat. We use the term “data checks” instead of “quality control.” There are some situations in which values that fail a data check may be realistic. For example, a persistence check may be triggered for relative humidity when precipitation or fog continues at a high elevation mountain site over extended periods.

Leveraging this to your advantage

The /timeseries, /latest and /nearesttime api services provide the data checks as additional attributes delivered along with the data.

Currently we are providing the following checks to our historical and real-time data.

By default we are no longer pushing data that fails our range check. A value that fails the range check is considered physically implausible and is replaced by a ‘null’ value. There is no need to modify existing codes that use the /timeseries, /latest and /nearesttime api services to take advantage of this major improvement to the api services. If you would like to continue using all data without checks, you can simple add &qc=off to your API requests.

A quick example

Using the other data checks is pretty seamless. All that is required is to add the &qc=on URL argument. There are three other arguments that both expand and restrict the data attributes returned.

With the following url:

https://api.mesowest.net/v2/stations/timeseries?token=demotoken&stid=HOL&start=201601060000&end=201601070000&qc=on

You can see see this data on our tabular page for reference too.

You will notice a few new data elements in the api response labeled QC:

QC_SUMMARY: {
    QC_SHORTNAMES: {
        1: "sl_range_check",
        80: "sl_uu2dvar_rejection"
    },
    PERCENT_OF_TOTAL_OBSERVATIONS_FLAGGED: 13.33,
    QC_TESTS_APPLIED: [
        "synopticlabs"
    ],
    QC_SOURCENAMES: {
        1: "SynopticLabs",
        80: "SynopticLabs"
    },
    TOTAL_OBSERVATIONS_FLAGGED: 8,
    QC_NAMES: {
        1: "SynopticLabs Range Check",
        80: "SynopticLabs UU2DVAR Rejection"
    }
}

The QC block only has keys for sensors that contain additional attributes, and the structure of those arrays match the sensor arrays in the OBSERVATIONS block, making it a one-to-one matching process. Also note that each entry is its own array, which allows for multiple attributes to be returned. At the root level of the response is the metadata describing the data checks.

{
    ...
    "STATION": [
        ...
        QC: {
            air_temp_set_1: [
                null,
                null,
                null,
                null,
                [80, 1 ],
                [80, 1],
                [80,1],
                [80,1]
            ]
    
        },
        "OBSERVATIONS": [
            air_temp_set_1: [
                23.89,
                23.89,
                23.33,
                23.33,
                -67.78,
                -67.78,
                -67.78,
                -67.78
            ]
        ]
    ]
    ...
}

Diving deeper

We offer additional methods to control the types of attributes returned. The following three URL arguments provide controls over which information is returned. These arguments also build on each other giving you flexibility in the data attributes you wish to receive.

When choosing to use data checks to augment the data returned, there are a few questions to ask yourself.

Do you want to apply data checks to your data? By default we apply the ‘sl_range_check’ but you can add additional checks or turn off the data check features (we do not recommend this). Which data checks do you want to apply to the data? How do you want the data returned back to you? Do you want to see the data or just the checks for context?

The following are URL arguments allow you to control the data check features.

  • &qc, “on|off”, turn qc on or off. If set to “off” then all data will be returned without data checks and quality control (this is not recommended). Setting this to “on” allows for control over how data checks are applied to the observed data values.
  • &qc_remove_data, “on|off|mark”, sets how observed data values are returned when the user specified data checks have failed for the data. A setting of “off” returns the data values even if a data check failure is present for that data. A setting of “on” removes failed data values with a “null”. A setting of “mark” replaces failed data with a value of “false”.
  • &qc_flags, “on|off”, toggles whether the data checks are returned alongside any data that failed a requested check. If “on” then the data checks will be returned in the API response.
  • &qc_checks, “[flag name]|[flag source]”, list of data checks applied to data values. The settings of other qc parameters determines how the data and data checks are returned. Where “all” will return any data check in our system. “flag name” allows the targeting of a flag by name or a list (comma separated) of flags. “flag source” allows targeting a data check provider i.e. “synopticlabs” for SynopticLabs.

If no qc parameters are used with an API request, the default settings are to not return any data that has failed our range check (sl_range_check). The default values are set to: &qc=on&qc_remove_data=on&qc_flags=sl_range_check.

If you choose to turn on qc with just &qc=on, the default values are: &qc_remove_data=off&qc_flags=on&qc_checks=synopticlabs. This will effectively return all flags and data for all SynopticLabs data checks.

References