Tutorial by Examples: dataset

Linear regression on the mtcars dataset

The built-in mtcars data frame contains information about 32 cars, including their weight, fuel efficiency (in miles-per-gallon), speed, etc. (To find out more about the dataset, use help(mtcars)). If we are interested in the relationship between fuel efficiency (mpg) and weight (wt) we may start p...

R Language • Linear Models (Regression)

Sample datasets

For ease of testing, sklearn provides some built-in datasets in sklearn.datasets module. For example, let's load Fisher's iris dataset: import sklearn.datasets iris_dataset = sklearn.datasets.load_iris() iris_dataset.keys() ['target_names', 'data', 'target', 'DESCR', 'feature_names'] You can ...

scikit-learn • Getting started with scikit-learn

Create spatial points from XY data set

When it comes to geographic data, R shows to be a powerful tool for data handling, analysis and visualisation. Often, spatial data is avaliable as an XY coordinate data set in tabular form. This example will show how to create a spatial data set from an XY data set. The packages rgdal and sp provi...

R Language • spatial analysis

Logistic regression on Titanic dataset

Logistic regression is a particular case of the generalized linear model, used to model dichotomous outcomes (probit and complementary log-log models are closely related). The name comes from the link function used, the logit or log-odds function. The inverse function of the logit is called the lo...

R Language • Generalized linear models

Select dataset except where values are in this other dataset

--dataset schemas must be identical SELECT 'Data1' as 'Column' UNION ALL SELECT 'Data2' as 'Column' UNION ALL SELECT 'Data3' as 'Column' UNION ALL SELECT 'Data4' as 'Column' UNION ALL SELECT 'Data5' as 'Column' EXCEPT SELECT 'Data3' as 'Column' --Returns Data1, Data2, Data4, and Data5

SQL • EXCEPT

Training a network on the Iris dataset

Given below is a simple example to train a Caffe model on the Iris data set in Python, using PyCaffe. It also gives the predicted outputs given some user-defined inputs. iris_tuto.py import subprocess import platform import copy from sklearn.datasets import load_iris import sklearn.metrics ...

caffe • Training a Caffe model with pycaffe

Prepare image dataset for image classification task

Caffe has a build-in input layer tailored for image classification tasks (i.e., single integer label per input image). This input "Data" layer is built upon an lmdb or leveldb data structure. In order to use "Data" layer one has to construct the data structure with all training d...

caffe • Prepare Data for Training

Create RDDs (Resilient Distributed Datasets)

From dataframe: mtrdd <- createDataFrame(sqlContext, mtcars) From csv: For csv's, you need to add the csv package to the environment before initiating the Spark context: Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.4.0" "sparkr-shel...

R Language • Spark API (SparkR)

Create an empty dataset based on an existing dataset

Method 1: proc sql; create table foo like sashelp.class; quit; Method 2: proc sql; create table bar as select * from sashelp.class (obs=0); quit; Method 1 should be the preferred option

sas • Proc SQL

View package's built-in data sets

To see built-in data sets from package dplyr data(package = "dplyr") No need to load the package first.

R Language • Inspecting packages

A Full Working Example of 2-layer Neural Network with Batch Normalization (MNIST Dataset)

Import libraries (language dependency: python 2.7) import tensorflow as tf import numpy as np from sklearn.datasets import fetch_mldata from sklearn.model_selection import train_test_split load data, prepare data mnist = fetch_mldata('MNIST original', data_home='./') print "MNIST data,...

tensorflow • Using Batch Normalization

Creating large random datasets

In [1]: import pandas as pd import numpy as np In [2]: df = pd.DataFrame(np.random.choice(['foo','bar','baz'], size=(100000,3))) df = df.apply(lambda col: col.astype('category')) In [3]: df.head() Out[3]: 0 1 2 0 bar foo baz 1 baz bar baz 2 foo foo b...

pandas • Categorical data

Getting started: Loading a dataset from file

The Iris flower data set is a widely used data set for demonstration purposes. We will load it, inspect it and slightly modify it for later use. import java.io.File; import java.net.URL; import weka.core.Instances; import weka.core.converters.ArffSaver; import weka.core.converters.CSVLoader; i...

machine-learning • An introduction to Classificiation: Generating several models using Weka

Fit: basic linear interpolation of a dataset

The basic use of fit is best explained by a simple example: f(x) = a + b*x + c*x**2 fit [-234:320][0:200] f(x) ’measured.dat’ using 1:2 skip 4 via a,b,c plot ’measured.dat’ u 1:2, f(x) Ranges may be specified to filter the data used in fitting. Out-of-range data points are ignored. (T. W...

Gnuplot • Fit data with gnuplot

Find matches in big data sets

In case of big data sets, the call of grepl("fox", test_sentences) does not perform well. Big data sets are e.g. crawled websites or million of Tweets, etc. The first acceleration is the usage of the perl = TRUE option. Even faster is the option fixed = TRUE. A complete example would be: ...

R Language • Pattern Matching and Replacement

getting data with data setp

data newclass(keep=first_name sex weight yearborn); set sashelp.class(drop=height rename=(name=first_name)); yearborn=year(date())-age; if yearborn >2002; run; Data specifies the target data set. Keep option specifies columns to print to target. Set specifies source data set. Drop s...

sas • data step

Built-in datasets

Rhas a vast collection of built-in datasets. Usually, they are used for teaching purposes to create quick and easily reproducible examples. There is a nice web-page listing the built-in datasets: https://vincentarelbundock.github.io/Rdatasets/datasets.html Example Swiss Fertility and Socioecono...

R Language • Data acquisition

Datasets within packages

There are packages that include data or are created specifically to disseminate datasets. When such a package is loaded (library(pkg)), the attached datasets become available either as R objects; or they need to be called with the data() function. Gapminder A nice dataset on the development of c...

R Language • Data acquisition