bioinformatics Common File Formats GCT


The GCT file format is a tab-delimited text file format used for describing processed gene expression or RNAi data, typically derived from microarray chip analysis. This data is arranged with a single annotated gene or probe per line, and a single chip sample per column (beyond the annotation columns). For example:

22215    2        
Name    Description    Tumor_One    Normal_One
1007_s_at    DDR1    -0.214548    -0.18069
1053_at    RFC2    0.868853    -1.330921
117_at    HSPA6    1.124814    0.933021
121_at    PAX8    -0.825381    0.102078
1255_g_at    GUCA1A    -0.734896    -0.184104
1294_at    UBE1L    -0.366741    -1.209838

In this example, the first line specifies the version of the GCT file specification, which in this case is 1.2. The second line specifies the number of rows of data (22215) and the number of samples (2). The header row specifies two annotation columns (Name for the chip probe set identifiers and Description for the gene symbols the probe set covers) and the names of the samples being assayed (Tumor_One and Normal_One). Each row of data beyond the header lists a single probe set identifier (in this case, Affymetrix gene chip probe sets), its corresponding gene symbol (if one exists), and the normalized values for each sample. Sample data values will vary based upon assay type and normalization methods, but are typically signed floating point numeric values.