SEPEM reference proton dataset

The SEPEM reference proton dataset v1.0

This page explains how the SEPEM H reference dataset 1.0 was constructed, using IMP-8/GME and GOES/SEM/EPS data.

The SEPEM reference proton dataset is intended to be used for analysing proton induced radation effects. The energy range covered by the dataset is 5–200 MeV, in ten logarithmically spaced channels, which can be interpreted as a virtual instrument dataset (the energy channels are listed below).

Data selection

The long duration datasets used to construct the SEPEM RDS v1.0 H dataset have been collected by instruments on IMP8 and on the GOES spacecraft series.

For IMP8, data from the GME and CPME instruments have been analysed for suitability in the reference dataset construction. It turned out that the CPME instrument suffered from severe saturation (and possibly other) effects, and that in addition there were numerous gaps in the dataset. Therefore, it was decided not to use these data. The GME data also show saturation effects and data gaps, but to a less severe extent than the CPME data. Nevertheless, especially during the second half of the mission, a number of very large SEP events are insufficiently covered by the GME data. A detailed analysis of the data caveats was compiled for the CPME and GME datasets.

The GOES/SEM proton data do not suffer from saturation, and only have relatively small data gaps, which can be filled with interpolation or usage of the secondary GOES spacecraft data. A detailed description of the GOES/SEM proton datasets and caveats was also compiled.

In order to make maximum use of the strong points of the respective datasets, the following procedure was used to combine the data:

IMP8/CPME data were not used.
For the period prior to the GOES05 mission (01-01-1984), IMP8/GME data were used. Older GOES data (dating back to 1974) are available and have been processed and ingested into the SEPEM database. After analysing these additional data, it was concluded that they are too noisy to be used for the reference proton dataset (note that the GOES05 data for years 1984 and 1985 were retrieved from this old FITS file series).
For the period 01-01-1984 to 28-02-2011, SEM proton data were used from successive GOES spacecraft as specified in the table below. The G data series (uncorrected five minute averaged data) was used.
From 01-03-2011 onward, GOES13/EPS data were used (again, uncorrected five minute averaged data).

**Data selection for the reference proton dataset**
Original dataset	Original time span	Time span of the selected data	Comments
IMP8/GME	01-11-1973–26-10-2001	01-11-1973–26-10-2001	Primary dataset prior to 01-01-1984. Data were also used to cross-calibrate the GOES/SEM data. Channel 92.5–107.0 MeV was not used.
GOES05/SEM	01-01-1984–24-03-1987	01-01-1984–05-03-1987
GOES07/SEM	06-03-1987–12-08-1996	06-03-1987–28-02-1995
GOES08/SEM-2	01-03-1995–31-05-2003	01-03-1995–31-05-2003
GOES11/SEM-2	01-07-2000–28-02-2011	21-06-2003–28-02-2011	Due to large data gaps, data prior to 21-06-2003 are only used for cross-calibration.
GOES12/SEM-2	01-01-2003–28-02-2010	01-06-2003–20-06-2003	Used to bridge the three week gap between GOES08 and GOES11. Channels P6 and P7 are missing, data values for this period set to 0.0026 and 0.0012, respectively (surrounding background level, verified with GOES10 that these channels remain at background level during this period).
GOES13/EPS	01-05-2010–30-06-2013	01-03-2011–31-03-2013	The cross-calibration factors for GOES11 were applied as there is insufficient overlap.

As the energy range of the SEPEM proton reference dataset is limited to 5–200 MeV, not all energy channels of the GME, SEM and EPS instruments are required. The tables below list the channels which were retained for the construction of the reference dataset.

Energy channels used for the proton reference dataset. Energies are given in MeV.

Reference dataset
Channel name	Energy range
F1	5.00–7.23
F2	7.23–10.46
F3	10.46–15.12
F4	15.12–21.87
F5	21.87–31.62
F6	31.62–45.73
F7	45.73–66.13
F8	66.13–95.64
F9	95.64–138.3
F10	138.3–200.0

GOES05-07/SEM
Channel name	Energy range
P2	4.2–8.7
P3	8.7–14.5
P4	15.0–44.0
P5	39.0–82.0
P6	84.0–200.0
P7	110.0–500.0

GOES08-12/SEM-2
Channel name	Energy range
P2	4.0–9.0
P3	9.0–15.0
P4	15.0–40.0
P5	40.0–80.0
P6	80.0–165.0
P7	165.0–500.0

GOES13/EPS
Channel name	Energy range
P2	4.2–8.7
P3	8.7–14.5
P4	15.0–40.0
P5	38.0–82.0
P6	84.0–200.0
P7	110.0–900.0

IMP8/GME
Channel name	Energy range
DIntn_8	4.94–5.96
DIntn_9	5.96–7.25
DIntn_10	7.25–8.65
DIntn_11	8.65–11.10
DIntn_12	11.10–13.60
DIntn_13	13.60–16.10
DIntn_14	16.10–18.70
DIntn_15	18.70–22.50
DIntn_16	19.80–24.20
DIntn_17	24.20–28.70
DIntn_18	28.70–35.20
DIntn_19	35.20–42.90
DIntn_20	42.90–51.00
DIntn_21	51.00–63.20
DIntn_22	63.20–81.00
DIntn_23	87.00–92.50
DIntn_25	107.0–121.0
DIntn_26	121.0–154.0
DIntn_27	154.0–178.0
DIntn_28	178.0–230.0

Data cleaning and gap filling

The datasets used for constructing the proton reference datasets contain numerous spikes and other corrupted data records. In addition, the GME data during the largest SEP events show saturation effects. The corrupted data records have to removed or corrected as they will contaminate any statistical analysis.

As an illustration, the figure below shows the GOES07/SEM proton data during the Oct 89 event.

GOES07/SEM data for the Oct 89 event

The next figure shows the same event as seen in two channels of the GME data. The following defects are immediately obvious:

data gaps all through the event, including around the peak for the high energy channel;
near the end of the event (31 Oct–1 Nov), a number of scattered points in the low energy channel;
in the rise phase (19–20 Oct), sudden decreases in the fluxes.

Two channels of the raw IMP8/GME data for the Oct 89 event

Upon closer examination, and comparison with the GOES/SEM fluxes, it turns out that the complete rise phase and the flux peak suffer from contamination. After removing the affected records, what remains is shown in the figure below. It is clear that most of the event is missed using the GME data. The same applies to several more large events, and in general, even during smaller events, the peak phase is often missed if saturated points are removed.

Two channels of the raw IMP8/GME data for the Oct 89 event, after removing the saturated and spurious fluxes

The situation for the SEM data is better, in the sense that saturation does not occur. However, the SEM data suffer from the appearance of data spikes, as illustrated below for a month of data of GOES05/SEM.

GOES05/SEM data sample illustrating data spikes

It is obvious that these spikes need to be removed before further data processing can take place. During the SEPEM project, a number of algorithms were tested to try to automatically recognize and remove data spikes. One automated method is implemented on the SEPEM server: median filtering. Although this method is able to recognize many of the spikes and to remove them, it has the tendency of lowering the event peak flux (by the nature of the method), and it does not recognize all data spikes.

Other methods were tried, but in the end it was decided to remove the spikes by hand. To this effect, an application was developed for the SEPEM server where suspicious data points can be marked and removed. This application was used to manually remove all data spikes in the GOES and IMP data that were used to construct the proton reference dataset. The figure below shows the same GOES05/SEM data sample as above after removing the spikes.

GOES05/SEM data sample of the previous figure after removing the data spikes

After removing the data spikes, the resulting gaps, and any other gaps in the original data, need to be filled in order to arrive at a continuous dataset. The figure below shows a two day sample of GOES07/SEM data where data gaps are clearly visible.

GOES07/SEM data sample showing data gaps

Using the data cleaning tool, these gaps were filled using a linear interpolation. The result is shown in the figure below.

GOES07/SEM data sample after filling the data gaps

All data gaps in the GOES/SEM data were filled in the same way, resulting in a new set of GOES/SEM data where all spikes have been removed, and all gaps filled. The data gaps in the IMP8/GME were not filled, as they are too large. The cleaned datasets are available on the system as tables standard_0001 to standard_0007.

In the SEPEM database, a separate table was created to store all data removal and gap filling actions, i.e. every data point, for the individual channels, that was removed or replaced during gap filling, has been logged.

Energy re-binning

Now that the respective datasets have been cleaned for data spikes and gaps have been filled (where possible), re-binning of the data into the 10 energy channels of the proton reference dataset can be performed. During the SEPEM project, a number of re-binning schemes were tried and compared.

Firstly, analytical fits to the energy spectra for each data record were computed. Using the analytical fits, new datasets were produced for the energy channels of the proton reference dataset. Three fit functions were tested: power law in energy, exponential in energy, and exponential in rigidity. After comparing the fitted data to the original data, it was concluded that using analytical fits over the total energy range of the proton reference dataset did not produce acceptable results, for the following reasons:

The energy spectra vary substantially from event to event, and even during events, so that applying a single analytical functions to all the data is not valid.
The high energy channel of the GOES/SEM instrument is contamined by background effects, resulting in analytical fit spectra that are much too hard. This is illustrated in the figure below, where the fluence spectrum integrated over the 13–23 Jul 2000 event is show as black squares. The coloured squares represent the anlytical fits for the proton reference energy channels. It is clear that the fluence in the original energy channel centred around 300 MeV is too high, resulting in poor analytical fits, which are also too hard.

Fluence spectra for the 13–23 Jul 2000 event using cleaned GOES11/SEM data: black squares represent the integrated data, coloured squares represent analytical fits for the energy channels of the proton reference dataset.

For reference, the same plot is shown using the proton reference dataset (which was obtained by applying cross-calibration to the IMP8/GME data, as described in the next section).

Fluence spectra for the 13–23 Jul 2000 event using the proton reference dataset: black squares represent the integrated data, coloured squares represent analytical fits for the energy channels of the proton reference dataset.

Using the cross-calibrated data, the background signal in the high energy channel has been substantially reduced, resulting in much better spectrum fits. Similar behaviour is shown over the entire dataset; the plots shown here were generated using the event spectra tool on the SEPEM server.

As applying analytical fit functions over the whole spectrum energy range results in unreliable spectra, it was decided to apply power law fits over each separate energy channel: for each energy in the proton reference dataset, the flux values at the boundaries of the original data channel enclosing the reference energy were used to interpolate the original flux to the reference energy. This procedure was repeated for all reference energy channels, for all data records in the GME and SEM datasets used to construct the proton reference dataset. This procedure was performed using the energy re-binning tool on the SEPEM server, for each of the five datasets used. The fitted data were stored in separate tables, for cross-calibration as described below.

Cross-calibration

At this stage, five new datasets, re-binned into the proton reference energy channels, are available: the GME dataset, and four GOES/SEM datasets.

Before merging these datasets into a single contiguous set, one more step needs to be taken. The SEM(-2) instruments on the various GOES spacecraft are monitor instruments and are not rigorously calibrated, and exhibit significant differences in response, making a simple concatenation of the re-binned datasets impossible. The procedure adopted to use a common baseline for the SEM(-2) instruments on the four GOES spacecraft used for the proton reference dataset, consist of using the GME data as a reference. The GME instrument is a science quality instrument, which has been properly calibrated.

For each of the four GOES datasets, and for each of the ten reference energy channels, linear regression fits were calculated to scatter plots of the GOES and GME data (after re-binning in energy). The figures below show the scatter plot for the 21.87–31.62 MeV channel, using GOES08 data, on a linear and logarithmic scale, respectively. The regression fit was calculated using the original values, not the logarithms.

Scatter plot of the overlapping IMP8/GME and GOES08/SEM-2 data for proton reference channel 21.87–31.62 MeV. The solid green line represents equality, the read line is the linear regression fit (the regression relation is shown at the top of the plot).

Scatter plot of the overlapping IMP8/GME and GOES08/SEM-2 data shown in the previous figure, now on a log-log scale. The regression line is the same as before (i.e. it was not re-calculated using a logarithmic scale).

For each data channel, the inverse regression fit was then applied to the GOES data, to align them with the GME data. The figures below show the data of the two above figures after applying the inverse fit. After applying the cross-calibration, the data are now scattered around the line of equality.

Scatter plot of the overlapping IMP8/GME and GOES08/SEM-2 data for proton reference channel 21.87–31.62 MeV, after applying the reverse regression fit. The solid green line represents equality, the read line is the original regression fit. The data are plotted on the same scale as the original plots.

As a further test, the regression fits were applied to the cross-calibrated data, shown in the figures below. The regression line thus obtained is identical to the line of equality.

Scatter plot of the overlapping IMP8/GME and cross-calibrated GOES08/SEM-2 data for proton reference channel 21.87–31.62 MeV.

Scatter plot of the overlapping IMP8/GME and cross-calibrated GOES08/SEM-2 data shown in the previous figure, now on a log-log scale.

Similar plots were produced for all proton reference energy channels, for the four GOES datasets used for the proton reference dataset. The plots are available as a zip archive. All regression fits were performed with the Cross-calibration tool on the SEPEM server.

Merging the datasets

The final step in the production of the proton reference dataset consists of merging the re-binned and cross-calibrated datasets, as specified in the data selection table at the top of the page. The full time range of the respective datasets was used for the cross-calibrations in order to ensure maximum overlap with the GME data. The cross-calibrated datasets were then trimmed to the time ranges specified in the third column of the table. The successive data tables were then merged into a new table, standard_0008, which contains the final proton reference dataset.

Last modified on: 14 December 2018.