CSV (Comma Separated Value) File Format provider

Overview

CSV is a common, non-standardized file format, that most spreadsheet programs can read. Samples that are stored in a CSV file are written as standard text, in rows, where the values per channel are separated by a comma.

This stream provider has a defined way how it writes values to a file, but is able to read more diverse CSV formats.

This file format has severe limitations, see below.

Details

Settings

This stream interface offers settings that specify how the interface should work. Those settings are available in the properties dialog as well as in variable parameters.

TimeFormat
Default value: h:m:s.3
Defines how times are formatted in the Time column. For more information about number format codes, see "Number format codes"

NumberFormat
Default value: >.4
Defines using what format numerical values are stored. Please note that if you use a comma as separator, that you cannot store numbers with a decimal comma! For more information about number format codes, see "Number format codes"

IncludeEvents
Default value: Yes
Set to Yes if event markers in the signal should be stored. They will be written to the Events column. Set to No if event markers should be skipped.

Separator
Default value: ,
Defines the character that is used to separate columns in the CSV file. Values allowed are: ',', ';', 'tab' or 'space'.

InvalidValue
Default value:
Specifies a value that you know is an invalid value in the signal. If the input value is equal to this value, then the InvalidValueText is written instead. If InvalidValueText is empty, then InvalidValue is not evaluated.

InvalidValueText
Default value:
Specifies the text that is displayed in case an invalid value is detected at the input. See also the documentation of InvalidValue.

Description

Writing signals to a CSV file

If Polybench writes values to a CSV file, the resulting file starts with one row that specifies the columns in the file. The row looks like this (example):

"Time","Events","Channel 1","Channel 2","Channel 3","Channel 4"

The first column is the time stamp of the sample. It looks like this:

0:00:41.946
0:00:41.948
0:00:41.950
0:00:41.952
0:00:41.954

The second column "Events" stores the event markers that are read from the signal. If multiple events exist for one sample, the events are appended to one string and separated by + symbols, like this:

"Marker 1+Marker 2+Marker 3"

The Events column is optional. You may leave events out by specifying No for IncludeEvents in the File Settings (see properties of the Storage operator).

The third and following columns contain the sampled values per channel.

Here another example of a CSV file that stores a signal from a measurement:

"Time","Events","Signal [unit]"
0:00:12.018,,0.113
0:00:12.020,,0.125
0:00:12.022,,0.138
0:00:12.024,,0.150
0:00:12.026,,0.163
0:00:12.028,,0.175
0:00:12.030,,0.187
0:00:12.032,,0.200
0:00:12.034,,0.212
0:00:12.036,,0.224
0:00:12.038,,0.236

If a channel has a unit, then the unit is written in square brackets behind the channel name.

The CSV file is written using the UTF-8 character encoding. The file does not have a BOM (byte order mark).

Reading signals from a CSV file

Polybench is able to read several CSV file formats. For Polybench to be able to interpret a CSV file, the following rules must be true:

The first line may, but does not have to be a header line. In the header line the name of each data column is written.
If a header-line exists, then the first column must be named 'Time' (case insensitive, so may also be 'time' or 'TIME'). If the name of the second column is 'Events', then that column is interpreted as an event marker column as described above. Otherwise, no event markers are assumed.

If no header is available, then the first column is interpreted as the Time column. The other columns are called C1, C2, C3, etc.

If behind channel names a pair of square brackets is detected, then the content between the brackets is used as channel unit. For example:

Time,EMG [uV]
0.018,10.94
0.020,11.0322
0.022,10.912

is shown in viewers as a file with one channel, called EMG with unit uV at 500 Hz.

(Note: In Polybench 1.30.0 and earlier the unit of channels were not stored)

Separating character

The name 'comma separated values' suggest that the columns in the data file are separated by a comma ',' character. However, they may also be separated by other characters:
- comma and white-spaces, for example:

0:10:25.14  ,  11.23  ,  -3.25

-or-

0:10:25.14,11.23, -3.25

- semi-colon (with or without white-spaces), for example:

0:10:25.14; 11.23; -3.25

- tabs, for example:

0:10:25.14    11.23    -3.25

- spaces, for example:

0:10:25.14 11.23 -3.25

but not multiple spaces.Wrong is:

0:10:25.14  11.23  -3.25

Text and values in quotation marks

Texts and values in the columns may be enclosed by quotation marks, for example: '0:10:25.14,"11.23","-3.25"'

Time column interpretation

The times in the first column must be formatted according to any of the following formats, where h=hour, m=minute, s=second and f=fraction of a second:

- h:m:s.f -or- h:m:s
- m:s.f -or- m:s
- s.f -or- s

The time must not contain a comma before the fraction (as may be the case in some European countries), so false is: '10:25,14', correct is '10:25.14'.

Evenly distributed time

Every line in the file is interpreted to be the next sample. The time in the time column is assumed to be the time of the previous line plus the sample time interval. The file is not interpreted correctly if there are missing samples.

If the file has been recorded with a sample frequency greater than 1000 Hz, or a sample frequency that is not dividible by steps of 1 ms, then the time format for a newly recorded file should be set to have enough digits to describe the time. Otherwise the time difference between two lines may be equal or may be different than between two other lines. This is allowed.
So, for example, if you are storing a signal of 2000 Hz, then the time format must be set to at least h:m:s.4 (so four digits to be able to represent the 0.0005 sec sample interval times).

Technical file properties

For advanced users - the following file encodings can be interpreted:
- UTF-7
- UTF-8 (with or without BOM)
- UTF-16 little endian and big endian
- UTF-32 little endian and big endian

ANSI encoding is interpreted as UTF-8; if characters greater than ASCII 127 are used, they may be displayed as small squares. This may affect channel names, units and event marker codes.

Please note that Polybench 1.30.0 and lower only interprets UTF-8 without BOM!

Limitations

CSV is a very simple and old table file format that is not bound to a standard. Because of its simplicity it is well suited for exporting data to other software programs, but also has severe limitations. The limitations in Polybench are described below.

No event marker updates

If a CSV file is being reviewed, then it is not possible to change, add or remove event markers in the file, as is possible in many other file formats. If markers are changed in review mode, then the changes are not stored in the file. Also, marker viewers etc. will not show any markers, because the markers are not separated from the signal data.

Not efficient

CSV does not contain any meta information about the contents of the file. Therefore, Polybench has to scan the file before each use, in order to know its contents (sample rate, channels, etc.). This takes time! Reading a CSV file may be slow. Also, CSV files use human-readible characters to write out the sample values, which is very inefficient and may lead to very large files.

General

Please beware of the limitations of CSV! Use the CSV file format to export signal data to other software applications, but better do not use CSV for production measurements that have to be post-processed in Polybench afterwards.