Using the Pipeline
From Tauwiki
"It won't work." --Jenkinson's Law
Contents
|
[edit] Overview
Data recorded by the instrument will be sent through the spacecraft telemetry system to the Master Control Facility in Hassan and will be processed into Level 1 data. The UVS data pipeline, which is a collection of independent modules (programs), starts its operations from level 1 data. Each one of the independent modules perform a separate logical step in generating scientifically useful data from the detector readings. There are four major stages in the pipeline flow, each of which are further expanded below.
- Ingest: Level 1 data consists of scientific events data and the spacecraft telemetry arranged in time order in a single file. First step in the pipeline is to separate them and write scientific data as fits binary tables. Relevent information from telemetry will go into the fits primary and extension headers.
- Validation: A number of checks will be done by the software to ensure that there is nothing grossly wrong with the data. These checks may include tracking the voltages, temperatures, total count etc. If a potentially serious condition arises, an operator will be notified and processing will continue.
- Correction for Instrumental Effects: Corrections at the instrumental level include distortion corrections, where the image is mapped to a different point based on its location in the field and flat fielding, wherein the instrument sensitivity depends on the location on the focal plane.
- Registration: We get a time-tagged series of photons from the instrument which we then convert into a map of the sky. In order to ease the pointing requirements on the spacecraft, we will correct each photon's spatial position by shifting and registering to other photons from the same star.
[edit] Init Parameters
The operation of the pipeline is governed by a set of parameter files in the subdirectory params in the UVS data pipeline root directory. Before you begin with the pipeline modules, please inspect these files and make sure you understand what it means. Formats and content of the parameter files are detailed in pipeline parameters. I shall not discuss more of parameter files here, except for the relevent parameters required for individual modules in the following sections.
[edit] Ingest and Validation
Ingest is the first step in the pipeline software and will convert the spacecraft data into scientifically useful data products. Data File Formats discusses the various levels of TAUVEX data. The raw data from spacecraft, recieved at the ground station is termed level 0. Level 0 data without fillers are called level 1 data. UVS data pipeline ingest reads level 1 data. The first module reads this binary data and separates out science data and satellite telemetry data. Second module generates fits binary tables and populate the fits headers (level 1a). The format of the Level 1a data is binary FITS tables with multiple extensions. The first extension contains only a header with all observation specific information. Subsequent extensions contain information about the individual photon hits and are separated by time stamp; ie., one extension contains all the photon detections within a 1/8 s time interval. The headers contain only information specific to that particular frame. Subsequent modules work on the fits files. The output data is the validated Level 1c data.
The purpose of each of the modules and their calling sequence are detailed in Jayant's Article.
[edit] UVS_ingest_data
UVS_ingest_data reads the level 1 data (science and telemetry), and write them into two separate disk files. A typical calling sequence is:
java -jar UVS_ingest_data.jar <Level1_data_file> <events_data_file> <telemetry_data_file>
It requires three command line parameters, the input data file, name for the output events data and telemetry files. For eg. if your input file is called level0data.dat, you would like the science data to be saved in events.dat and satellite telemetry to be saved in a file called telemetry.dat, then calling sequence for UVS_ingest_data would be:
[reks@marvin:~/pipeline/jar]$ java -jar UVS_ingest_data.jar level0data.dat events.dat telemetry.dat
Tasks:
- read level 1 data
- remove fillers, if any
- identify science data
- identify telemetry data
- write out science data
- write out telemetry data
[edit] UVS_create_level1a
UVS_create_level1a reads UVS_Pipeline_initparams.txt for a location of temp directory to write intermediate files. The parameter is UVS_TMP.If your data area is low on disk space, you may modify this parameter. This module also require the parameter file UVS_Btableheader_keywords.txt, which contains the list of keywords to be added into the fits binary table header. Under normal circumstances, this file should not be modified. A typical calling sequence is:
java -jar UVS_create_level1a.jar <events_data_file> <telemetry_data_file> <Level1afile_root_name>
It requires three command line parameters, the events data file, the telemetry file and a string, which is the output file's root name. For eg. if your input file is called level0data.dat and satellite telemetry is saved in a file called telemetry.dat, then calling sequence for UVS_create_level1a would be:
[reks@marvin:~/pipeline/jar]$ java -jar UVS_create_level1a.jar level0data.dat telemetry.dat outputdata
Note the last argument outputdata (level1a root name). If you do so, the output files will be named:
outputdata_T1_1_1a.fits
outputdata_T2_1_1a.fits
outputdata_T3_1_1a.fits
Tasks:
- read events data and log
- read telemetry
- write fits header from available information (log/telemetry)
- write level1a data as binary tables, one extension per frame
[edit] UVS_validate_data
java -jar UVS_validate_data.jar <Level1a_data_file> <error_log_file>
Requires two command line parameters, the fits file to be validated (usually the level 1a data, created by UVS_create_level1a) and a log file name, to write out any errors to. The cutoff values to decide the validity of data are read from the parameter file UVS_Pipeline_initparams.txt. The parameters are listed below.
| Parameter | Purpose | Comments |
| Detector readings | See TAUVEX Detectors for more information | |
| LOWER_W | Lower cutoff for Wedge | |
| UPPER_W | Upper cutoff for Wedge | |
| LOWER_S | Lower cutoff for Strip | |
| UPPER_S | Upper cutoff for Strip | |
| LOWER_Z | Lower cutoff for Zigzag | |
| UPPER_Z | Upper cutoff for Zigzag | |
| Pulse height distribution | ||
| LOWER_WSZ | Lower cutoff limit | If W+S+Z for any event is not |
| UPPER_WSZ | Upper cutoff limit | within this range, event is invalid |
| Number of Frames | ||
| LOWER_NFRAMES | Lower cutoff | If total number of frames in an |
| UPPER_NFRAMES | Upper cutoff | observation is not within this range, |
| it is flagged as invalid observation. | ||
| Total Event counts | ||
| LOWER_NCOUNTS | Lower limit | If total number of events are not |
| UPPER_NCOUNTS | Upper limit | within this range, observation is |
| is flagged as invalid | ||
| Event rate (events/frame) | ||
| MINCOUNT | Lower limit | frame invalid if events per frame |
| MAXCOUNT | Upper limit | is outside these limits |
UVS_validate_data will also generate plots like total number of counts per frame v/s frame number. These plots are meant to help assess the data quality.
Tasks:
- read level1a data
- read validation parameters from parameter file
- create plot
- validate data
- In case of critical errors, write error log
[edit] UVS_calc_xy
java -jar UVS_calc_xy.jar <Level1a_data_file> <Level1b_data_file>
UVS_calc_xy reads the level1a data file and calculates the (x, y) position of each events from the W,S \& Z values (detector readings). The output is level 1b file, which is similar to level1a, but with two extra columns in each fits extension tables (for x and y). This module reads UVS_Pipeline_initparams.txt for total number of pixels in the image (Keyword: PIXELS). Default is 1024 x 1024 image. Do not modify this value unless you understand the requirement to do so.
Tasks:
- read level1a data
- convert wsz values to xy
- update level1a header
- insert x and y columns to data
- write level1b data
[edit] Data Correction
There are several instrumental effects including geometric distortions and flat fielding that have to be corrected for. The input data is the validated Level 1a data. Output is similar to the input fits binary table files, except that more information will be added as we move through the modules. Each of the modules within this section are discussed below.
[edit] UVS_get_calfiles
java -jar UVS_get_calfiles.jar <Level1b_data_file>
UVS_get_calfiles will read the available calibration file names (both flux calibration and flat field) listed in the parameter file UVS_Pipeline_calparams.txt, select the files which has calibration information at the closest epoch to the observation date (date in the level1b file header) and add the calibration file names into level 1b file header. Before you run this module, make sure that you have the appropriate flux calibration and flat field file with you and the path to these files are specified in the parameter file.
Tasks:
- read level1a header
- read parameter file
- get calibration file
- update level1b header
[edit] UVS_apply_cal
java -jar UVS_apply_cal.jar <Level1b_data_file> <Level1c_data_file>
Reads level 1b data, open the flux calibration file listed in level1b header and adds a calibration column to the existing data. This calibrated data is written out as level 1c file.
Tasks:
- read level1b header
- get calibration information
- apply calibration
- write level1c file
[edit] UVS_flat_field
java -jar UVS_flat_field.jar <Level1c_data_file> <Level1d_data_file>
The response of the detector may not be uniform over the area. Hence, flat fielding is required to equalize the gain of the detector. The relevant calibration file name is entered into the header by UVS_get_calfiles. UVS_flat_field will read the flatfield calibration file and update the calibration column. Output is written as level 1d file.
Tasks:
- read level1c data
- get flat field information
- apply flat field
- write level1d data
[edit] UVS_geom_corr
java -jar UVS_geom_corr.jar <Level1d_data_file> <Level1e_data_file>
UVS_geom_corr corrects for the geometric distortion in the field of view. The distortion co-efficients are read from the parameter file UVS\_Pipeline\_distcoeff.txt. The correction factors for each pixels are added as two separate columns into the binary fits tables.
Tasks:
- read level1d data
- read geometric distortion coefficients
- apply geometric correction
- write level1e data
[edit] Data Registration
Since TAUVEX is a scanning instrument, sources drift within the field of view and each photon will be shifted in the detector plane. This step corrects the positions for all the shifts, including spacecraft jitter. The input data is the geometrically and photometrically corrected photon list. Output data will be an image of the sky (Level 2 data).
[edit] UVS_calc_radec
java -jar UVS_calc_radec.jar <Level1e_data_file> <Level1f_data_file>
Once the data is corrected for distortions and calibration is done, we can convert the x,y positions of each event on the detector plane to sky co-ordinates. The x,y positions of events from one source will be different in different frames, since source will be moving across the field of view. The sky co-ordinates of all events from the same source, however, will remain the same. The sky co-ordinates of each event are added as separate columns into the fits tables. To do this conversion, we need the co-ordinates of one reference point. This information comes from the telemetry stream. UVS_calc_radec does the initial pass, which will be refined further by UVS_register_data.
Tasks:
- read level1e data
- convert xy to equatorial co-ordinates
- convert xy to galactic co-ordinates
- write level1f data
[edit] UVS_register_data
java -jar UVS_register_data.jar <Level1f_data_file> <Level1g_data_file>
Sky co-ordinate information obtained using UVS_calc_radec will have an accuracy of 0.1deg (accuracy of co-ordinate information from telemetry file), which is not good enough for an instrument with a spacial resolution of 6$^{}$ to 10$^{}$. This module will identify point sources in the field of view, identify their positions, compare with an existing catalog and update the position information. Output is written as level 1g file.
Tasks:
- read level1f data
- create image
- find point sources
- find corrections
- write level1g data
[edit] UVS_create_image
java -jar UVS_create_image.jar <Level1g_data_file> <Level2_data_file>
Reads the level 1g data file, apply calibration and distortion corrections, select valid data points and writes out level 2 data (fits image). Image header keywords are read from parameter file UVS_Imgheader_keywords.txt. Avoid modifying this parameter file unless you understand the requirements to do so.
Tasks:
- read level1g data
- apply calibration
- apply distortion correction
- ignore inavalid data
- create image
- write level2 header
- write level2 data
[edit] Things to do before you start
Before you start with the software, make sure that:
- You have checked all the parameter files in the subdirectory params and ensured that you are happy with all the init parameter values.
- Copy parameter files or the directory params into your working directory. That is a prime requirement to run the UVS pipeline modules.
- Get the latest flux calibration and flat field file from the TAUVEX webpage download area. Location of these files should be specified in the file UVS_Pipeline_calparams.txt.
- Download the latest on distortion co-efficients from the TAUVEX webpage. Save those values in UVS_Pipeline_distcoeff.txt.
[edit] The UVS pipeline control script
The UVS data pipeline is designed to be fully automatic. Which means that once you decide on your input files and (optionally) some parameters, all you have to do is call the UVS pipeline control script, and wait for it to do plenty of work and present you with images. Unfortunately, such an automatic option is currently available only on linux/unix. One of us wrote a bash script to automate the pipeline and that choice was favoured by the data center people as well. The script should work on any flavors of unix with a bash v3.0 or later, but we have tested it only on linux. If you do not have bash, please skip this section.
Before anybody start with the UVS data pipeline, certain minimum checks need to be done to ensure a smooth operation. Starting from a check on availability of java (TM) to an estimate of available memory and disk space, the script will check the available resources and tune the pipeline performance. If you have downloaded a binary or source distribution of the pipeline, the control script is available in the scripts directory.
[edit] Script design philosophy
The purpose of the script is to enable a smooth automated operation of the pipeline. The following points were considered in its design:
- It should cater the needs for a new user and also provide enough customizability for an advanced user.
- Before the UVS_pipeline modules (jar) are fired, we should check for available memory and allocate a reasonable amount for the java virtual machine.
- The output data files should never be over-written at any cost!
- A detailed log, both error and informational, should be maintained and this too should not be overwritten by, say another instance of the script.
[edit] Calling the script
The UVS_run_pipeline accepts a number of command line arguments. These inputs may be set as environment variables also. If both are present, the value at command line will override the value defined as environment variable. Two of the input parameters are mandatory to run the script. They are: the level 0 (or level 1) data file name and the telemetry data file name. The data definitions are given in the previous chapter. If the script is called without any arguments or with -h or --help as arguments, then a brief help message is printed on terminal. Life gets a lot more simpler if an environment variable is defined to point at the top of UVS_data_pipeline distribution. Environment variables available across shells are declared using the export command:
export UVS_pipeline_home=/home/reks/pipeline
Once this is done, the script will expect the jar files of each modules to be in $UVS_pipeline_home/jar and parameter files in $UVS_pipeline_home/params. If this is not done, script will start making guesses and it may not always work. You can of course opt for an advanced usage and pass in the location of jar files also (See section on advanced usage).
[edit] Mandatory Inputs
The bare minimum required to run the pipeline is the name of the input file, which is usually the level 1 data file. There are two ways to pass on inputs to the script: Either you pass it on as command line argument or export the path to the filename as an environment variable.
[reks@marvin:~/pipeline/scripts]$ UVS_run_pipeline -i level0.dat
[reks@marvin:~/pipeline/scripts]$ export UVS_input_file=level0.dat [reks@marvin:~/pipeline/scripts]$ UVS_run_pipeline
NOTE: There are no defaults for mandatory inputs.
[edit] Log messages
As the script calls the UVS_pipeline modules, the messages generated are saved in two files. They are:
- UVS_data_pipeline_run.log: For informational messages.
- UVS_data_pipeline_error.log: For error messages. If this file size goes above zero, the script will halt.
If you re-run the script, any existing log files, science (events) data and telemetry files will be renamed to <file_name>_yyyyMMddhhmmss, where the string yyyyMMddhhmmss denotes the last modified time of the file. yyyy stands for the year (eg. 2006), MM for the month number (01-12), dd for the date (01-28/29/30/31), hh for the hour (01-24), mm for minutes (00-59) and ss for seconds (00-59).
[edit] Optional Inputs
By default, the output data including all intermediate data products are written to the directory ./UVS_data_pipeline_YYYYMMDD$, where YYYYMMDD is the system date, for eg. 05th June 2006 as 20060605. This value may be changed using the command line flag -o <dirname>. If the directory does not exist, it will be created. A full list of available options are listed in the next section (Advanced Usage).
[edit] Advanced Usage
Any sufficiently advanced user would dislike a software making assumptions and guesses. For eg. A geek will never like the idea that our script decides on memory to be allocated for java virtual machine. User may feel like adding extra CLASS-PATH when calling the UVS data pipeline modules. You can safely skip this section if you are happy with normal usage.
[edit] JAR Directory
Each of the UVS_pipeline module is packed as a java archive file (jar). If you have not declared the UVS_pipeline_home (section n.nn), we script attempts to guess the location of your UVS_pipeline distribution. It may not always work. If the guess goes wrong or you have the jar files of UVS_pipeline_modules in some hidden directory, then use this option to guide the script to the jars. As seen in the previous sections, you have a number of options here.
export UVS_jar_dir=/path/to/jar_files
And then proceed with a usual script call, OR
UVS_run_pipeline --jar-dir=/path/to/jar_files <other_inputs>
[edit] JAVA Location
By default, the script will expect java to be in your $PATH. Nothing would work if java is not found. A very common issue on a linux machine is /usr/bin/java from gcc-compiler-collection (hereafter gcc-java). UVS_data_pipeline would run only on Sun java (hereafter java), which may be placed in obscure locations like /opt/java/jdk-1.4.8_06/bin/. In such situations, you need to specify the location of java. As seen in the previous sections, you have a number of options here:
export UVS_java_loc=/path/to/java
And then proceed with a usual script call, OR
UVS_run_pipeline --java-loc=/path/to/java <other_inputs>
Where /path/to/java points to the directory containing java executable. If for eg. java is placed at /opt/java/jdk-1.4.8\_06/bin/java, path/to/java would be /opt/java/jdk-1.4.8\_06/bin.
[edit] JAVA CLASSPATH
If you need to specify a CLASSPATH to run java, pass it through this option.
export UVS_java_classpath=/path/to/class1:/path/to/class2
And then proceed with a usual script call, OR
UVS_run_pipeline --java-class-path=/path/to/class1 <other_inputs>
[edit] JVM Memory usage
By default, the script will probe the available memory and allocates 50% (0.5) of it for JVM to run UVS\_pipeline modules. If you are not happy with that number, it can be changed. For eg., you are happy to give 75% of your memory to UVS_pipeline. In that case:
export UVS_memory_limit=0.75
And then proceed with a usual script call, OR
UVS_run_pipeline --memory-limit=0.75 <other_inputs>}
NOTE: The value passed on is the fraction of total memory available.
[edit] Extra JAVA Arguments
Under normal circumstances, you will not need it, but there may be times when extra options are required to run java (apart from memory limit, $CLASSPATH$, etc). If you are in such a situation, this one may be used.
export UVS_java_args="arg1 arg2"
And then proceed with a usual script call, OR
UVS_run_pipeline --extra-java-args="arg1 arg2" <other_inputs>
[edit] Output datafile prefix
Default value for the output file prefix is HHMMSS, the hours-minutes-seconds of the time at which the script was called. This is a convinient way to avoid overwritting the files if you decide to call the script several times. Incase you did not like that idea, you can force a change by:
export UVS_outfile_prefix="mydatafile"
And then proceed with a usual script call, OR
UVS_run_pipeline --output-file-prefix="mydatafile" <other_inputs>
NOTE: The string passed is attached at the beginning of each file names. As in the above example, if you pass mydatafile, then the level1a data file from telescope 1 will be named as mydatafile_T1_1.fits and so on.
[edit] Summary of Options
Given below is a summary of all the available options. You can either pass on a command line flag as seen in the left column or define an environment variable as seen on the right.
| Command | Description | Environment |
| -h --help | Print a brief help message | |
| -i <filename> --input-file=<filename> | input file name (Level 1 data) Default: None | UVS_input_file |
| -o <output_dir> --output-dir=<output_dir> | Name of directory to save intermediate and final output data Default: UVS_data_pipeline_yyyyMMdd Example: UVS_data_pipeline_20070707 | UVS_output_dir |
| -j <path_to_jardir> --jar-dir=<path_to_jardir> | Location of directory containing the jar files of UVS_pipeline modules Default: $UVS_pipeline_home/jar | UVS_jar_dir |
| --java-loc=<path_to_java> | Location of sun java (TM) executable Default: from $PATH | UVS_java_loc |
| --java-class-path=<extra_path> | Extra CLASSPATH for java Default: None | UVS_java_classpath |
| --memory-limit=<0.nn> | Fraction of total memory to be used by UVS_pipeline Default: 0.5 (50%) of available memory | UVS_memory_limit |
| --extra-java-args=<additional_args> | Additional arguments for java Default: None | UVS_java_args |
| --output-file-prefix=<file_prefix> | The string prefix for each output filenames Default: HHMMSS time in hours-minutes-seconds of calling the script | UVS_outfile_prefix |
[edit] A typical Session
We recommend you begin with exporting $UVS_pipeline_home variable, supply the mandatory input (level 1 file) and leave everything else in the default state. In doing so, we assume that java is available in $PATH. In the following example, the UVS_data_pipeline folders are in the directory pipeline. Tauvex level 1 data is in a directory called tauvex_data. The terminal window should look like:
[me@marvin:~]$ export UVS_pipeline_home=/home/me/pipeline [me@marvin:~]$ echo $UVS_pipeline_home /home/me/pipeline [me@marvin:~]$ cd tauvex_data [me@marvin:~/tauvex_data]$ [me@marvin:~/tauvex_data]$ ls level0data.dat [me@marvin:~/tauvex_data]$
Now, you need to copy the parameter files into the data area and modify them if required.
[me@marvin:~/tauvex_data]$ cp -r $UVS_pipeline_home/params . [me@marvin:~/tauvex_data]$ ls -F params/ level0data.dat [me@marvin:~/tauvex_data]$
Start the UVS_run_pipeline.sh from the same directory where data files are saved..
[me@marvin:~/tauvex_data]$ $UVS_pipeline_home/scripts/UVS_run_pipeline.sh
You should start seeing messages like:
-----------------------------------------------------------
..::UVS_run_pipeline.sh v0.9.5::..
The UVS data pipeline governing script
-----------------------------------------------------------
INPUT : data = level0data.dat
INFO : Found Input file, level0data.dat [11MB]
INFO : Output will be written to the directory
: "./UVS_pipeline_20070218"
INFO : Found java version "1.5.0_09"
INFO : Found jar files in
: /home/me/UVS_data_pipeline/jar
INFO : Found init parameter files in ./params
INFO : I may use upto 247M MB of your machine's memory
INFO : Processing level 1 data from Telescope 1
--And so on--
[edit] While pipeline runs..
UVS_data_pipeline is expected to handle large data files. As an eg., the TAUVEX mission will produce about a couple of GB of scientific data per day. The data pipeline software takes about an hour to process it (on a normal desktop). One hour is a long time to stare at the terminal window. To make it worse, the script would have taken up most of your memory and processor time as well. Author of the control script recommends a tea break.
[edit] When things go wrong
UVS_run_pipeline.sh was written to make scientists comfortable with running the data pipeline modules. However, a smooth operation on our machines may not necessarily indicate the same on other machines elsewhere. You may face problems and script might crash or ends with unexpected results. We can classify the problems into two distinct classes:
- Script gave error and crashed.
- No error message, but unexpected output.
The first case is easy to handle. In most cases, the error messages on screen or in the error log file will be sufficient to identify the problem. If you can figure out what went wrong, go ahead and fix it yourself (do let us know about the problem and the fix). Otherwise, we can help you fix the problem. The second issue, of no error and faulty output is more enlightening. If you came across one such situations, congratulations! You have just found something that we had never thought of. Report it immediatly. For information on how to report bugs, please see Reporting_Bugs.
