
research
>
lab > documentation > plotting

Simple Plotting and Analysis
This page describes some simple scripts and tools for analyzing
and plotting data, e.g., data produced by Swarm programs
that are run using the
drone experiment management tool.
Note well:
These scripts may or may not be in the PATH set for your account
on the CSCS machines, and if they are not, you will get an error like:
command not found
If you have trouble running them as described below,
you might need to pre-pend the following directory location
in front of the commands:
/users/rlr/Scripts/
Suppose you have run drone to produce a number of report files
(for different RNG seeds) for each of a number of cases
(particular parameter settings) in an experiment.
For example, suppose you have report files in the directory:
expdata/experiment-1/NB=128-RMP=0.00
Each of the report.* files in that case directory is the output
from one run of the
UM-HeatbugsPlus Swarm program (with a unique seed).
They each have a format like:
# Program heatbugsPlus, Version 0.9, run on 02/07/98 at 16:04:12
@begin
reportFileName = report
reportFileNameSuffix = 00
...
@end
##############################################
# unhappiness
# T Min Avg Max
0 0.522 0.723 0.947
1 0.456 0.696 0.942
You can produce a file ("outfile" in this case) that has the average across the runs
of the min/avg/max unhappiness for each time step by issuing the command:
getAvgColsOverFiles.pl -c1,2,3 -e@end -v expdata/exper-->
-->iment-1/NB=128-RMP=0.00/report.* \
> outfile
The -v parameter tells it to put a few lines of information at the start
of outfile to tell you where the data came from.
This data can then be read into your favorite plotter, for example.
To get more information about the getAvgColsOverFiles.pl perl script:
getAvgColsOverFiles.pl -h
You can also directly produce a plot for this data using xmgrace:
getAvgColsOverFiles.pl -c1,2,3 -e@end expdata/exper-->
-->iment-1/NB=128-RMP=0.00/report.* | \
/common/scripts/xmgrace -pipe >& /dev/null &
See the
local xmgrace User Guide www pages for more information about using it to enhance your
graph, to specify how the graph should look (e.g., labels)
via command and parameter files, etc.
In some cases you will want to produce a graph that has lines for
a single column of output (i.e., a single measurement variable)
from each of the several different runs, e.g., so you can see if
there are large differences in the dynamics when the system
is started with different RNG seeds.
You can do this as follows:
getColsFromFiles.pl -c2 -e@end expdata/exper-->
-->iment-1/NB=128-RMP=0.00/report.* | \
/common/scripts/xmgrace -pipe >& /dev/null &
You can also use the getColsFromFiles.pl script to write the data to a file, e.g.,
getColsFromFiles.pl -c2 -e@end expdata/exper-->
-->iment-1/NB=128-RMP=0.00/report.* > outfile
to right to the file named "outfile".
Note that by default getColsFromFiles.pl extracts data from the first first
3 files that match the pattern (report.* above) in the specified directory.
You can have it extract data from different files using the optional -n parameter, e.g.,
getColsFromFiles.pl -c2 -n0-1 -e@end expdata/exper-->
-->iment-1/NB=128-RMP=0.00/report.* | \
/common/scripts/xmgrace -pipe
will extract data from the first 2 files (note they are counted from 0!), and
getColsFromFiles.pl -c2 -n4-6 -e@end expdata/exper-->
-->iment-1/NB=128-RMP=0.00/report.* | \
/common/scripts/xmgrace -pipe
will extract data from the 5th through 7th files that match that pattern.
Of course there must be that many files in the data directories for this to work!
NOTE WELL:
For more information about any of the scripts describe on
this page, you can run them with just the "-h" parameter.
For example, enter: getColsFromFiles.pl -h
Its important to do this, as the new features may be added
and older features deprecated
(and so some details on this page may become out-of-date).
Another common analysis is to get the average of some value over the
last part of a run, e.g., the "equilibrium" value of average unhappiness
from a heatbugs run.
Note that if one does several runs with different RNG seeds,
some simple-to-calculate values of possible interest are:
- Average within one run. This might be interpreted as the
"equilibrium" value of the measure (assuming it really has settled down).
- Standard deviation (SD) associated with the average in one run.
This gives an indication of much the measure is varying within a
run over the time period measured. A large SD might indicate
the value has not settled down to an equilibrium, or it might indicate
the value is varying a lot perhaps around a steady mean.
- Average of the averages from different runs.
This gives an estimate of the mean over runs with different RNG seeds
(i.e., different intial conditions and/or orders of chance events).
- The SD of the average of the averages. This gives some indicatation as to
whether the mean results are sensitive to different RNG seeds.
- Average of the SDs from the different runs. This gives some indicatation
as to whether the in-run variance is sensitive to different RNG seeds.
These measures for a report file of the type UM-HeatbugsPlus produces
are calculated by the getEquilAvgs.pl script.
As with the getColsFromFiles.pl and getAvgColsOverFiles.pl, the user
must specify the columns to examine, the files to examine, and whether there
is a special mark indicating where data starts.
The user may also specifiy the fraction of the end of the run over
which the calculations are made.
For example
getEquilAvgs.pl -c1,2,3 -e@end -v -s.2 expdata/exper-->
-->iment-1/NB=16-RMP=0.00/report.*
will calculate the above measures for columns 1, 2 and 3 in the named report files,
over the last 20% of the runs.
Output will look like:
# Equilibrium data Sun Feb 15 10:20:25 EST 1998
# From last 0.20 of the data (40 pts).
# From directory: expdata/experiment-1/NB=16-RMP=0.00
# FirstFile report.00, numfiles 3
# Column numbers in files: '1 2 3'
# For each file, average (sd) over points in column
# Then print average (sd) over averages across files,
# and average (and sd) of StdDev's across files.
FileName Column 1 Column 2 Column 3
-------------- ------------------- ------------------- -------------------
report.00 0.2364 ( 0.0276) 0.4859 ( 0.0056) 0.7794 ( 0.0102)
report.01 0.0956 ( 0.0520) 0.4965 ( 0.0086) 0.8185 ( 0.0092)
report.02 0.1063 ( 0.0593) 0.5069 ( 0.0096) 0.7614 ( 0.0108)
-------------- ------------------- ------------------- -------------------
Avg over files 0.1461 ( 0.0463) 0.4964 ( 0.0079) 0.7864 ( 0.0101)
SDs over Avgs 0.0784 ( 0.0166) 0.0105 ( 0.0021) 0.0291 ( 0.0008)
One can also get similar numbers for all the sub-directories (cases) in an
experiment directory. For example:
~rlr/Scripts/getEquilAvgsFromDirs.pl -c2 -s0.99 -e@end -f"report.*" -dexpda-->
-->ta/experiment-3
Equilibrium data Wed Feb 18 08:17:24 EST 1998
From last 0.99 of the data.
From directory: expdata/experiment-3
FilePattern: report.*
Column numbers in files: 2
For each file, average (sd) over points in column
Then print average (sd) over averages across files,
and average (and sd) of StdDev's across files.
Overall: avg of (sd over avg of (sd over
in-RunAvgs those) inRunSDs those)
Directory Column 2
----------------------------- -----------------------------------
e=0.99 0.211 ( 0.016) 0.138 ( 0.030)
e=0.994 0.198 ( 0.015) 0.148 ( 0.130)
e=0.996 0.181 ( 0.008) 0.150 ( 0.023)
e=0.997 0.179 ( 0.007) 0.157 ( 0.071)
e=0.998 0.163 ( 0.010) 0.154 ( 0.035)
e=0.999 0.154 ( 0.008) 0.156 ( 0.050)
Note that the "avg of inRunAvgs" is calculated by taking the mean
within each run (each report.* file), and then calculating the average
over all those values, along with the standard deviation over that mean of means.
The "avg of inRunSDs" is calculated by first calculating the standard deviation
within the part of each run used to calculate the in-RunAvg, and then
calculating the average of those (as well as a standard deviation over those values.
Thus the inRunSDs measures the "wiggliness" within the runs,
and we get an average of those across the runs to see how variable
the measure is within runs.
For more information on these scripts, run them with just the -h option.
These perl scripts can all be found with the
UM-ExpTools-4 tools for use with Swarm programs.
You can look at these scripts:
getAvgColsOverFiles.pl
getAvgColsOverDirs.pl
getEquilAvgs.pl
getEquilAvgsFromDirs.pl
just by poking them, too!
Plans for the future of these scripts include change them so
that one can specify the plot title, the x,y axes labels,
and labels and symbols for the individual lines on the scripts,
as well as adding a direct-to-printer parameter.
What I'd really like to see is a standard in the input report
files, in the #-comment lines before the data, which
define names for the columns of data, including short names,
and long names for display on graphs, so the user doesn't have
to specify them.

Updated September 1, 2005
|