|
To extract a data set from the TSDS data base, only the time series name is required. To list all of the available time series (this may take a while), type
>> ts_listTo list only time series names that contain the string "FMI", type
>> ts_list('FMI')(The directory contains subdirectories with names that correspond to the data source, i.e., "FMI", and "CDAWeb". Use these strings with ts_list to restrict the list to a given data source.)
To get all of data associated with the time series named "OUJ X (FMI 1-min)", type
>> A = ts_slice('OUJ X (FMI 1-min)');To get all of the metadata associated with the time series named "OUJ X (FMI 1-min)", type
>> A_info = ts_get('OUJ X (FMI 1-min)')(The absence of a ";" at the end of the above command tells the interpreter to list the output to the screen.)
To view elements 20-100 of array A, type
>> A(20:100)To save the array A to a text file, type
>> save -text A.txt ATo determine the location and size of this file, type
>> pwd ; ls -lhTo get data in a time interval (say, during the Halloween Storms), type
>> A = ts_slice('OUJ X (FMI 1-min)',[2003 10 29],[2003 11 1]);The demos ts_get_demo.m and ts_slice_demo.m in the directory time_series_fns can be modified with any text editor and executed from the Octave or Matlab command line.
Also see the demo program [ts_insert_demo.m] for a more advanced example. Type which ts_insert_demo to see the location of this file.
How TSDS data files are created
All of the data available through TSDS have the data + code available that is required to make the final data file. For examples of how to do this, see the subdirectories of the directory make_fns/. The programs in these subdirectories (1) auto-download the data from the data provider (using wget), parse the data sets and put them into the TSDS format, add metadata, and create the final set of files that are read with calls to ts_list, ts_get, and ts_slice. To run some of these programs you will need Perl. They should work without problem on Unix systems where perl is located in /bin. On Windows you will need to find the location of a perl binary (usually one is distributed with Matlab) and edit the TSDS_PERL variable in TSDS_GLOBALS.mMethod 0 The current method is to just have a user put the files named TSDS_username_SourceTab.EXT (created by ts_insert) in the recipients data directory. The recipient then needs to execute ts_db('user',1) to update their data base.
Method 1
This functionality does not exist, but this is how I would prefer to exchange data. This method involves simply creating a data file as in the Creating a Data Set section. To exchange this new (preliminary) data with someone who has TSDS they first execute
>> ts_upload(NAME,FTP_SERVER_NAME) % (This function is not yet available)
which uploads the files TSDS_username_SourceTag.EXT and TSDS_username_tag-metadataonly.EXT to a ftp server, The string username is the last name of contributor (you must edit the variable TSDS_GLOBALS.m). The string SourceTag is the short string used in the ts_insert command. All users will have access to this file after executing
>> ts_update(FTP_SERVER_NAME) % (This function is not yet available)
Q: What data are available through TSDS?
A: Over ~ 4 GB
(compressed) of time series data drawn from Augsburg, CDAWeb, COHOWeb, OMNIWeb,
DMI WDC, FMI, ISGI, Kyoto, WSO. Also, time series of space physics
coordinate transform matrices, planetary ephemeris (JPL DE), and
stock market indices (from Yahoo). A (short) list with names only [.txt] or the full (long) list of
time series with metadata [.txt].
Q: Why is a 20 MB file downloaded if only 1 day of data is requested?
A: The data are stored in 20 MB file chunks. If you are running out of disk
space, delete any large file in ~/.tsds/data (Unix) or tsds/data (Windows) that
does not have the string "metadataonly" in its name. It will be re-downloaded
as needed.
Q: What is the data file format?
A: Data are stored in both Matlab binary (V6 .mat) although this could
be changed. The entire data set could be exported to HDF, H5, CDF,
netCDF, or text; see the function ts_export and the discussion in the Data section.
Originally I planned to put the data set either
HDF (.hdf), HDF 5 (.h5), CDF, or netCDF. These data formats have (1)
essentially the same capabilities (for purposes relevent to TSDS) (2) varying
levels of usability in different programs (for example, although Matlab has CDF
support the reader is terribly inefficient.) (3) Varying levels of support on
different platforms. I settled on Matlab V6 binary because it can be read by
Octave, which is free (as in price) and is easily installed on the major operating
systems.
Q: What language is TSDS written in?
A: The programs for reading and exporting the data in are
Matlab/Octave scripts. Knowledge of these programs is not required
for for using TSDS. The wget utility is used for downloading orignial data and
Perl is used for parsing orignal data files in text format.
Q: What about updating the data?
A: In principle a user should be able update the time series by executing the
programs in the make_fns
directory. However, I have not tested these programs for
cross-platform and Matlab/Octave compatability as much as the main
TSDS programs, so their use may require some effort.
TSDS requires Matlab or Octave (Free/Libre). Extensive experience with Matlab or Octave is not required. After installation, instructions and tutorial information are displayed to the screen. All path and configuration information is stored in the text file TSDS_GLOBALS.m; you should not need to change the parameters in this file. You can install TSDS into any directory; the subdirectory tsds-0.9.35/data (or ~/.tsds under Linux systems) will be used to store all of the downloaded data. If you are running out of disk space, delete any .mat file that does not have the string "metadataonly" in its name. It will be re-downloaded as needed.
Windows XP:
Linux:
Source code: [~1 MB zip]
Multi-user installation (not recommended): If you are a system administrator for a multi-user Linux system, you can install tsds into a system directory (i.e., /usr/local), but downloaded data will be stored in the users home directory. You may want to have each users ~/.tsds/data directory linked symbolically to a common directory.
Alternatively, the program tsds_test_files will try to download all of the TSDS data files and load each time series into memory.
TSDS was created to address several issues we have encountered when attempting to do statistical analyses that require time series from data providers:
The TSDS package contains two parts: a set of programs that interface with the TSDS data sets and a set of "make" programs that download from the data providers and transform it into the TSDS format. The TSDS data format is not intended to be a new data format. Instead it is a temporary format; a number of examples are given for transforming all TSDS data into more common scientific data formats such as CDF or H5. In dealing with 1. and 2. we have found that writing a program that automatically downloads and extracts data from all available files in a given format is only slightly more complex than writing one to extract a day or two of data for a given research task. The TSDS programs are simply a generalized and organized collection of such parsing files that we have used at one time or another.
The following set of commands will plot every time series in the data set to the screen. (You probably don't want to do this; it is noted here only to make the point that this is how I would prefer to access data). The for loop will take a very long time to execute because it will attempt to download all of the TSDS data files (> ~ 4 GB compressed) if they have not been downloaded previously. (The TSDS install package contains only metadata and no data files, so an internet connection is needed to do anything more than browse the metadata using the ts_list function.)
>> NAMES = ts_list; >> for i = 1:length(NAMES) >> D = ts_slice(NAMES{i}); >> I = ts_get(NAMES{i}); >> plot(D) >> I >> end
Data are downloaded automatically, as needed. However, if want access to all of the data when you are offline (> 4 GB; you probably do not want to do this), you can download all of the TSDS data files, execute the shell commands from the tsds-0.9.35 directory:
% cd ./data/final-s0 % wget -nH -p "http://www.scs.gmu.edu/~rweigel/tsds_data/final-s0"
An important feature of TSDS and its associated data set is its expendability, which was developed to address 3. To add a time series, one only needs to execute a few commands after creating two text files, one with the time series data and the other with metadata. Exchange of a time series with someone running TSDS, requires only a few TSDS commands.
To address 4., TSDS data sets are versioned in two ways. There is a snapshot number that corresponds to the date the data files were downloaded. Each snapshot has a version 0 time series which are exact copies of what was extracted from the original data files. If we find a problem with a time series, either a time series with a new version number is created or a metadata tag is added that informs the user of the problem. If a data source changes the data in a file used in a previous snapshot, a time series with a new snapshot number is created; the data from a previous time will always be available, which is important to ensure that someone doing a validation and metric analysis five years from now is able to reproduce the results of today, even when todays version of the data is no longer available from the original data source.
The directory contains subdirectories with names that correspond to the data source. The online data directory contains all of the raw data used to create a versioned time series that are available through TSDS. Most of its subdirectories were created using a "wget" call to the URL of a data provider. The web location of the data can be inferred from the directory structure created by wget. The *_get.m programs in the subdirectories of make_fns/ usually contain the "wget" command used to download the data. The subdirectory manual_dl/ contains data files that could not be downloaded using "wget".
All of the data in the original-sX/ (where s means snapshot) directories represent a snapshot in time of a data source's directory. Many data sources will change file contents without changing filenames. For this reason, TSDS data may differ from what currently exists, which may or may not matter depending on your application.
The X in the directory name "original-sX/" indicates the snapshot number. When a new snapshot is taken, the original-sX directory is copied to original-s(X+1). Then the "wget" command is initiated starting in the directory original-s(X+1). A feature of "wget" is that it will not re-download a file on the server if the same-named file on the local disk is older. If any files containing data before Year_f in a sX time series were changed, the relevant make_fns/ command is run using data in the s(X+1) directory and a new time series is generated with label s(X+1).
The .m files in the make_fns directory take data from the subdirectory original-sX of the data directory and transforms it to a uniform time grid. (a few of the .m files are not yet Octave-compatible and will require Matlab.) The original data files are usually parsed and transformed by .pl and .m files that are part of the TSDS source code and create files
that are located in the final-sX/ subdirectory of data. Currently EXT=mat (V6), but this is easily changed by modifying the program ts_mat.m. For example, see ts_hdf.m, which deals with EXT=HDF. For a discussion of data file formats see the FAQ.
All of these files will have an associated *-metadataonly.* file in the same directory. Some of the larger data files are not distributed with in the standard TSDS package. When a user requests data that is in a file that is not part of the standard data distribution, a message will appear that states that the data is being downloaded with the program "wget" along with the estimated download time. The data files are usually 20 MB or less in size.
Versioning notes
Transcription notes
Caution: The data in these files have been tested by (1) visually inspecting a plot of the sorted time series (e.g., with the Octave command plot(sort(X)), (2) by comparing select time intervals with plots available from the original data sources on the web and the original data files (You can see what tests were done by looking at the *_test.m functions in the directory tsds/make_fns). It is still possible that transcription errors exist. Before presenting these data, do quality checks of your own and report any errors you find to rweigel@gmu.edu. This data set was created to make analysis of data from many sources easier. However, always do your own checking and make sure you read the notes and information given by the the original data providers (a web link to the original data source is provided in the metadata and is viewable using ts_list(TS_NAME)).
The data files contain extensive metadata including information about where the data were obtained, the location of relevant README files, a string indicating the first year of data, and the geographic locations of the measurement instruments (for ground magnetometers).