ARFF files (Attribute-Relation File Format) are the most common format for data used in Weka. Each ARFF file must have a header describing what each data instance should be like. The attributes that can be used are as follows:
Real or integer numbers.
Nominal attributes must provide a set of possible values. For example:
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
Allows for arbitrary string values. Usually processed later using the StringToWordVector
filter.
Allows for dates to be specified. As with Java's SimpleDateFormat
, this date can also be formatted; it will default to ISO-8601 format.
An example header can be seen as follows:
@RELATION iris
@ATTRIBUTE sepallength NUMERIC
@ATTRIBUTE sepalwidth NUMERIC
@ATTRIBUTE petallength NUMERIC
@ATTRIBUTE petalwidth NUMERIC
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
Following the header each instance must be listed with the correct number of instances; if an attributes value for an instance is not known a ?
can be used instead. The following shows an example of the set of instances in an ARFF file:
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa