IOSTATS(1)
NAME
IOstats - Produce report from I/O detail trace records
about I/O operations by request type
SYNOPSIS
IOstats [-processor number] [-fileID id] [-paramFile
params]
[-browserFile browseOut] tracefile
DESCRIPTION
IOstats generates a report of application I/O activity
summarized by I/O request type from detailed I/O trace
records in the input SDDF trace file. The necessary
trace event records are produced by the I/O extension to
the Pablo trace library by default whenever the I/O exten-
sion has been initialized and individual I/O calls have
been instrumented.
As IOstats is running, it periodically displays the number
of input trace packets (records) processed to standard
error. The report output is directed to standard out.
Several paragraphs of text describing the report contents
are included after the actual I/O activity information.
An expanded version of the descriptive text is included in
the section "THE REPORT" below.
If the input file does not include any I/O detail trace
records, the report will be generated without error but
will contain all zeros. For the report to work as
intended, there must be I/O detail trace records in the
input SDDF file. If the programmer elects not to collect
I/O detail trace records for the entire application execu-
tion, the report must be interpreted with that in mind.
OPTIONS
-processor number
Only I/O activity occurring on the specified pro-
cessor is included in the report. A comment con-
taining the processor number is written to standard
error when IOstats begins execution. The processor
number is also included in the report itself after
the heading "Reported Processor:".
The default behavior is to include I/O activity
from all processors.
-fileID id
Only I/O activity associated with the specified
file, as identified by id, is included in the
report. A comment containing the file ID is writ-
ten to standard error when IOstats begins execu-
tion. The file ID and file name are also included
in the report itself after the headings "Reported
avoid this file ID to file name mismatch problem.
The default behavior is to include I/O activity for
all files in the report.
-paramFile params
A parameter file, params, may be used to override
the default configurations for the minimum, maxi-
mum, and bin sizes found in the report histograms.
A comment containing the name of the parameter file
is written to standard error when IOstats begins
execution.
This option allows the user to tune the histogram
parameters to suit the characteristics of their
input trace data. See the section "USING A PARAME-
TER FILE" below for more details on the format and
contents of the parameter file.
-browserFile browserOut
When specified, this option causes IOstats to pro-
duce an SDDF file, browserOut, containing a subset
of the reported information in a format that can be
loaded and viewed with the Pablo Browser program.
See "THE BROWSER DATA" section below for details on
the information written to the SDDF file.
Note that the Pablo Brower program is currently not
distributed outside the Pablo research group due to
limited portability across platforms.
BEFORE RUNNING THE PROGRAM
The application to be studied must be instrumented with
the I/O extension to the Pablo trace library and should
NOT call the function disableIOdetail(). The instrumented
application is run and one or more trace files are gener-
ated - the number of trace files depends on the number of
processors used to run the application.
If there are multiple trace files, they should be merged
with the MergePabloTraces(1) program to produce a single
trace file for the execution including information from
all processors. It is possible to run IOstats on unmerged
trace files, but the output will only contain information
for the single processor reported in the input trace file.
If you intend to use the -fileID option to report only
accesses to individual files, the program SyncIOfileIDs(1)
should be run on the input trace file to synchronize IDs
for files that were opened more than once by the applica-
tion. The file generated by SyncIOfileIDs ending in
".syncFiles" can be used as input to IOstats. The file
ending in ".syncFiles.map" can be used to identify the
file ID of the particular file whose I/O you are inter-
ested in viewing.
If I/O detail trace records were not produced for all I/O
operations in the application, either because some calls
were not instrumented or because detail tracing was dis-
abled for part of the execution, the report will include
only those operations for which detail records were pro-
duced and should be interpreted with that in mind.
THE REPORT
The report summarizes information from detailed I/O event
trace records. Every I/O operation for the processor(s)
and file(s) in the input tracefile that is not filtered
out by a command line -fileID or -processor switch is
treated separately - there is no attempt to correlate
operations that take place concurrently on individual
nodes of a multiprocessor system. Said another way, the
processor and file ID information in the trace event
records is used for filtering purposes, but is not consid-
ered beyond that as the statistics are generated for this
report.
The First and Last I/O Operation timestamps are based on
I/O requests from every processor included in the report.
These values give a general feel for the wall-clock time
over which the I/O reporting takes place.
The report is divided into sections by operation type:
Open, Close, Read, Seek, Write, Flush, and Miscellaneous
I/O. In addition, for traces gathered on Intel Paragon
systems, there are sections for the operations Global
Open, Asynchronous Read, Asynchronous Write, IO Wait and
IO Done, IO Mode, and Lsize. The last section of the
report gives totals for All I/O Operations reported indi-
vidually in the earlier sections.
The information reported for the individual operation
types in their respective sections is very similar. An
operation count and total time is given in each section,
along with the mean, standard deviation, variance, mini-
mum, maximum, and histogram for the operation durations
(in seconds) and the time between operations of the same
type (also in seconds). For Read and Write operations,
additional information about bytes transferred, bytes
transferred over total time, and statistical summaries for
Bytes Transferred and Bytes/Second based on individual
requests are provided. The Seek operation section con-
tains a statistical summary of bytes traversed. When
asynchronous operations are used, the bytes transferred
are reported in the Asynchronous Read and Asynchronous
Write sections of the output. The following paragraphs
explain further the meaning of the reported values.
The operation count reports the number of times an opera-
tion occurred on any processor included in the report.
For example, say the source program issued a single read
request which is executed on 16 nodes of a parallel
machine. The report will show a count of 16 read opera-
tions when run on a file with trace data collected from
all 16 nodes. That is, each I/O request on each processor
is treated separately even though a single call in the
program source caused the "multiple" requests.
Following this line of reason, sections for operations
which involve byte transfer (reads, writes, asynchronous
reads, asynchronous writes) report the total number of
bytes involved on any processor included in the report.
Say our previous read operation requested 4 bytes of data,
it executed on 16 nodes so ( 4 * 16 ) = 64 bytes were
transferred by that read request.
When operation durations are reported, each operation on
each processor is again treated separately, even though
they may overlap in wall-clock time. Again using our read
example, let's say all processors start reading at time n
seconds and finish at time n+2 seconds. The total time
for the read operations would be ( 16 * (n+2 - n) ) = 32
seconds. The mean read duration would be 2, and there
would be no deviation from the mean. If our program did
very little except this read, it's quite possible that in
wall-clock time the total program execution may take less
than 32 seconds.
The "Bytes Xferred/Total Time" for this example is ( 64 /
32 ) = 2 bytes/second. Note that this is not the machine
throughput which would be ( 64 / 2 ) = 32 bytes/second.
(This throughput value is not included in the report.)
The report also gives the bytes/second based on individual
read requests. For our read example, every request was
for 4 bytes and took 2 seconds so the mean is 2 bytes/sec-
ond and there is no deviation from that.
Let's change our example a bit and say each read still
requested 4 bytes of data, but 8 nodes finished in 2
seconds and 8 finished in 4 seconds. The total time would
be ( 8*2 + 8*4 ) = 48 seconds and the "Bytes Xferred /
Total Time" is ( 64 / 48 ) = 1.333 bytes/second. Here
machine throughput might be calculated as ( 64 / 4 ) = 16
bytes/second. For the modified example, the bytes/second
based on individual read requests would be ( 4 / 2 ) = 2
bytes/second for 8 nodes and ( 4 / 4 ) = 1 bytes/second
for 8 nodes yielding a mean of 1.5, a standard deviation
of .516, and a variance of .267.
Now let's change the example further and say that the
reads do not begin at the same time on each node but
instead there is a 3 second delay between the start of
each read. The cryptic ascii diagram below attempts to
show this scenario... nodes are labeled 0 through f and
appear under the timeline where they would be performing
their read operations.
1 2 3 4
Time 0123456789012345678901234567890123456789012345678
Nodes 00 11 22 33 44 55 66 77 8888 aaaa cccc eeee
9999 bbbb dddd ffff
Here we see that 48 seconds elapses on the wall-clock
while the reads take place. But, no reads are taking
place during eight of those seconds (2, 5, 8, ...) and
the machine throughput is open to interpretation. Do we
say the throughput is ( 64 / 48 ) = 1.333 bytes/second, or
should the idle seconds be subtracted out yielding ( 64 /
(48 - 8 ) ) = 1.6 bytes/second, or should we somehow par-
tition the requests where multiple nodes are doing reads
separately? We really don't know why the read requests
are offset by 3 seconds - should that offset be "charged"
to the I/O system? Difficulties such as these, combined
with the inability to reconstruct from a trace file the
program source that generated read operations across mul-
tiple nodes highlight why no machine throughput numbers
appear on the report.
Another statistic given for each reported I/O operation is
the time between XXX (where XXX is some I/O operation)
operations in seconds. When multiple processors are
included in the report, this value represents the time
between any XXX operation on any node that is included in
the report (in seconds). In the first and second examples
above where all reads began at the same time on all nodes,
the mean would be 0. In the third example, where each
read began 3 seconds after the read on another node, the
mean would be 3 seconds. None of the examples deviate
from the mean.
Finally, for systems which include Asynch Read and Write
information, the durations reported are for the asyn-
chronous read or asynchronous write calls and do not
include the actual time required to transfer the requested
bytes.
USING A PARAMETER FILE
The IOstats program has default configuration parameters
controlling the minimums, maximums, and bin sizes used in
the histograms included in the report. The user may over-
ride these defaults for some or all of the histograms by
using the -paramFile command line option.
The parameter file is an ASCII SDDF file whose records
provide histogram configuration information which will
override the minimum, maximum, and bin size defaults com-
piled into the IOstats executable. All configuration
records have the name "initialize IO histogram" and the
value of the first field "statistic" controls which his-
togram to configure. A sample parameter file follows:
SDDFA
#1:
// "IOstats" "Histogram initialization"
"initialize IO histogram" {
// "Recognized Statistics" "Reported as"
// "time between IO" "Time between I/O Operations"
// "open durations" "Open Operation Durations"
// "time between opens" "Time between Open Operations"
// "close durations" "Close Operation Durations"
// "time between closes" "Time between Close Operations"
// "read durations" "Read Operation Durations"
// "time between reads" "Time between Read Operations"
// "read sizes" "Read Operation Bytes Transferred"
// "read throughput" "Bytes/Second based on individual Read
s"
// "seek durations" "Seek Operation Durations"
// "time between seeks" "Time between Seek Operations"
// "seek sizes" "Seek Operation Bytes Traversed"
// "write durations" "Write Operation Durations"
// "time between writes" "Time between Write Operations"
// "write sizes" "Write Operation Bytes Transferred"
// "write throughput" "Bytes/Secnd based on individual Write
s"
// "flush durations" "Flush Operation Durations"
// "time between flushes" "Time between Flush Operations"
// "misc i/o durations" "Misc I/O Operation Durations"
// "time between misc i/o" "Time between Misc I/O Operations"
// "global open durations" "Global Open Operation Durations"
// "time between global opens" "Time between Global Open Operations"
// "asynch read durations" "Asynch Read Operation Durations"
// "time between asynch reads" "Time between Asynch Read Operations"
// "asynch read sizes" "Asynch Read Operation Bytes Transf...
"
// "asynch write durations" "Asynch Write Operation Durations"
// "time between asynch writes" "Time between Asynch Write Operations"
// "asynch write sizes" "Asynch Write Operation Bytes Trans...
"
// "iowait durations" "IO Wait and IO Done Operation Dura...
"
// "time between iowaits" "Time between IO Wait and IO Done O...
"
// "iomode durations" "IO Mode Operation Durations"
// "time between iomodes" "Time between IO Mode Operations"
// "lsize durations" "Lsize Operation Durations"
// "time between lsizes" "Time between Lsize Operations"
char "statistic"[];
// "Minimum" "Minimum bin value"
double "minimum";
// "Maximum" "Maximum bin value"
double "maximum";
// "Bin Size" "Size of bin"
double "bin size";
};;
"initialize IO histogram" {
[30] { "read sizes" },
0, 1024, 256
};;
"initialize IO histogram" {
[30] { "write sizes" },
1024, 10240, 1024
};;
This parameter file contains two data records which over-
ride the default values for two of the histograms produced
by the IOstats program. First, a minimum value of 0, a
maximum value of 1024, and a bin size of 256 will be used
in the histogram "Read Operation Bytes Transferred:".
Second, a minimum value of 1024, a maximum value of 10240,
and a bin size of 1024 will be used in the histogram
"Write Operation Bytes Transferred:".
The bin size should divide evenly into (maximum - mini-
mum), and ( (maximum - minimum) / bin size ) yields the
number of "bounded" bins appearing in the histogram. In
addition to the "bounded" bins, there are underflow and
overflow bins to report values that fall outside the mini-
mum/maximum range selected.
The comments for the field "statistic" in the descriptor
for the "initialize IO histogram" record contain the
strings that should be used in the data records to over-
ride all possible histogram default configurations.
THE BROWSER DATA
The browser SDDF file contains histogram data for each
type of I/O operation ( IO_OP_NAME = Open, Close, Read,
Seek, Write, Flush, Misc I/O, Global Open, Asynch Read,
Asynch Write, IO Wait and IO Done, IO Mode, Lsize) and
summary information across all operation types.
o IO_OP_NAME Operation Durations (seconds)
o Time between IO_OP_NAME Operations (seconds)
o Time between I/O Operations (seconds)
o Read Operation Bytes Transferred:
o Seek Operation Bytes Traversed:
o Write Operation Bytes Transferred:
o Bytes/Second based on individual Read Requests
o Bytes/Second based on individual Write Requests
o Count of I/O Operations
o Total Time of I/O Operations
o Mean Duration of I/O Operations
KNOWN PROBLEMS
For files accessed in a global mode, the "Seek Operation
Bytes Traversed:" will often be incorrect. In particular,
on Intel Paragon systems, the reported seek bytes should
not be trusted for files with an iomode of M_LOG, M_SYNC,
M_RECORD, or M_GLOBAL. The I/O extension to the Pablo
instrumentation library attempts to minimize the overhead
incurred in gathering file pointer information, and does
not track file pointer positioning correctly when the
activity on one processor affects the file pointer posi-
tion on another processor. An attempt will be made to
address this problem in the next release.
FILES
/Templates/IOstats.parameters
A sample parameter file to use with the -paramFile
option.
SEE ALSO
FileRegionIOstats(1), IOstatsTable(1), IOtotalsByPE(1),
MergePabloTraces(1), LifetimeIOstats(1), SyncIOfileIDs(1),
TimeWindowIOstats(1)
Ruth A. Aydt, A User's Guide to Pablo I/O Instrumentation
Ruth A. Aydt, The Pablo Self-Defining Data Format
COPYRIGHT
Copyright 1994-1996, The University of Illinois Board of
Trustees.
AUTHOR
Ruth A. Aydt, University of Illinois
Pablo Environment Oct 16, 1996