weka Loading Instances ARFF Files


Example

ARFF files (Attribute-Relation File Format) are the most common format for data used in Weka. Each ARFF file must have a header describing what each data instance should be like. The attributes that can be used are as follows:

  • Numeric

Real or integer numbers.

  • Nominal

Nominal attributes must provide a set of possible values. For example:

@ATTRIBUTE class        {Iris-setosa,Iris-versicolor,Iris-virginica}
  • String

Allows for arbitrary string values. Usually processed later using the StringToWordVector filter.

  • Date

Allows for dates to be specified. As with Java's SimpleDateFormat, this date can also be formatted; it will default to ISO-8601 format.

An example header can be seen as follows:

@RELATION iris

@ATTRIBUTE sepallength  NUMERIC
@ATTRIBUTE sepalwidth   NUMERIC
@ATTRIBUTE petallength  NUMERIC
@ATTRIBUTE petalwidth   NUMERIC
@ATTRIBUTE class        {Iris-setosa,Iris-versicolor,Iris-virginica}

Following the header each instance must be listed with the correct number of instances; if an attributes value for an instance is not known a ? can be used instead. The following shows an example of the set of instances in an ARFF file:

@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa