Module Owl_stats

module Owl_stats: sig .. end
Statistics: random number generators, PDF and CDF functions, and hypothesis tests.


The module includes some basic statistical functions such as mean, variance, skew, and etc. It also includes the following three submodules.

The Owl_stats.Rnd module provides random number generators of various distributions.

The Owl_stats.Pdf module provides a range of probability density/mass functions of different distributions.

The Owl_stats.Cdf module provides cumulative distribution functions.

Please refer to GSL documentation for details.

Randomisation functions

val seed : int -> unit
seed x sets x as seed for the internal random number generator.
val shuffle : 'a array -> 'a array
shuffle x return a new array of the shuffled x.
val choose : 'a array -> int -> 'a array
choose x n draw n samples from x without replecement.
val sample : 'a array -> int -> 'a array
sample x n draw n samples from x with replacement.

Basic statistical functions

val mean : ?w:float array -> float array -> float
val variance : ?w:float array -> ?mean:float -> float array -> float
val std : ?w:float array -> ?mean:float -> float array -> float
std x calculates the standard deviation of x.
val sem : ?w:float array -> ?mean:float -> float array -> float
sem x calculates the standard error of x, also referred to as standard error of the mean.
val absdev : ?w:float array -> ?mean:float -> float array -> float
val skew : ?w:float array -> ?mean:float -> ?sd:float -> float array -> float
val kurtosis : ?w:float array -> ?mean:float -> ?sd:float -> float array -> float
kurtosis x return the Pearson's kurtosis of x.
val central_moment : int -> float array -> float
val covariance : ?mean0:float -> ?mean1:float -> float array -> float array -> float
val correlation : float array -> float array -> float
val pearson_r : float array -> float array -> float
val kendall_tau : float array -> float array -> float
val spearman_rho : float array -> float array -> float
val autocorrelation : ?lag:int -> float array -> float
val median : float array -> float
median x returns the median of x.
val percentile : float array -> float -> float
percentile x p returns the p percentile of the data x. p is between 0. and 1. x does not need to be sorted.
val first_quartile : float array -> float
first_quartile x returns the first quartile of x, i.e., 25 percentiles.
val third_quartile : float array -> float
third_quartile x returns the third quartile of x, i.e., 75 percentiles.
val min : float array -> float
val max : float array -> float
val minmax : float array -> float * float
val min_i : float array -> float * int
val max_i : float array -> float * int
val minmax_i : float array -> float * int * float * int
val sort : ?inc:bool -> float array -> float array
val argsort : ?inc:bool -> float array -> int array
val rank : ?ties_strategy:[ `Average | `Max | `Min ] -> float array -> float array
Computes sample's ranks.

The ranking order is from the smallest one to the largest. For example rank [|54.; 74.; 55.; 86.; 56.|] returns [|1.; 4.; 2.; 5.; 3.|]. Note that the ranking starts with one!

ties_strategy controls which ranks are assigned to equal values:


val histogram : float array -> int -> int array
val ecdf : float array -> float array * float array
ecdf x returns (x',f) which are the empirical cumulative distribution function f of x at points x'. x' is just x sorted in increasing order with duplicates removed.
val z_score : mu:float -> sigma:float -> float array -> float array
val t_score : float array -> float array
val normlise_pdf : float array -> float array

MCMC: Markov Chain Monte Carlo

val metropolis_hastings : (float array -> float) -> float array -> int -> float array array
TODO: metropolis_hastings f p n is Metropolis-Hastings MCMC algorithm. f is pdf of the p
val gibbs_sampling : (float array -> int -> float) -> float array -> int -> float array array
TODO: gibbs_sampling f p n is Gibbs sampler. f is a sampler based on the full conditional function of all variables

Hypothesis tests

type tail = 
| BothSide
| RightSide
| LeftSide (*
Types of alternative hypothesis tests: one-side, left-side, or right-side.
*)
val z_test : mu:float ->
sigma:float ->
?alpha:float -> ?side:tail -> float array -> bool * float * float
z_test ~mu ~sigma ~alpha ~side x returns a test decision for the null hypothesis that the data x comes from a normal distribution with mean mu and a standard deviation sigma, using the z-test of alpha significance level. The alternative hypothesis is that the mean is not mu.

The result h,p,z: h is true if the test rejects the null hypothesis at the alpha significance level, and false otherwise. p is the p-value and z is the z-score.

val t_test : mu:float ->
?alpha:float -> ?side:tail -> float array -> bool * float * float
t_test ~mu ~alpha ~side x returns a test decision of one-sample t-test which is a parametric test of the location parameter when the population standard deviation is unknown. mu is population mean, alpha is the significance level.
val t_test_paired : ?alpha:float ->
?side:tail -> float array -> float array -> bool * float * float
t_test_paired ~alpha ~side x y returns a test decision for the null hypothesis that the data in x – y comes from a normal distribution with mean equal to zero and unknown variance, using the paired-sample t-test.
val t_test_unpaired : ?alpha:float ->
?side:tail ->
?equal_var:bool -> float array -> float array -> bool * float * float
t_test_unpaired ~alpha ~side ~equal_var x y returns a test decision for the null hypothesis that the data in vectors x and y comes from independent random samples from normal distributions with equal means and equal but unknown variances, using the two-sample t-test. The alternative hypothesis is that the data in x and y comes from populations with unequal means.

equal_var indicates whether two samples have the same variance. If the two variances are not the same, the test is referred to as Welche's t-test.

val var_test : ?alpha:float ->
?side:tail -> var:float -> float array -> bool * float * float
var_test ~alpha ~side ~var x returns a test decision for the null hypothesis that the data in x comes from a normal distribution with variance var, using the chi-square variance test. The alternative hypothesis is that x comes from a normal distribution with a different variance.
val jb_test : ?alpha:float -> float array -> bool * float * float
jb_test ~alpha x returns a test decision for the null hypothesis that the data x comes from a normal distribution with an unknown mean and variance, using the Jarque-Bera test.
val fisher_test : ?alpha:float ->
?side:tail -> int -> int -> int -> int -> bool * float * float
fisher_test ~alpha ~side a b c d fisher's exact test for contingency table |a, b| |c, d| . The result h,p,z: h is true if the test rejects the null hypothesis at the alpha significance level, and false otherwise. p is the p-value and z is prior odds ratio.
val runs_test : ?alpha:float ->
?side:tail -> ?v:float -> float array -> bool * float * float
runs_test ~alpha ~v x returns a test decision for the null hypothesis that the data x comes in random order, against the alternative that they do not, by runnign Wald–Wolfowitz runs test. The test is based on the number of runs of consecutive values above or below the mean of x. ~v is the reference value, the default value is the median of x.
val mannwhitneyu : ?alpha:float ->
?side:tail -> float array -> float array -> bool * float * float
mannwhitneyu ~alpha ~side x y Computes the Mann-Whitney rank test on samples x and y. If length of each sample less than 10 and no ties, then using exact test (see paper Ying Kuen Cheung and Jerome H. Klotz (1997) The Mann Whitney Wilcoxon distribution using linked list Statistica Sinica 7 805-813), else usning asymptotic normal distribution.
val wilcoxon : ?alpha:float ->
?side:tail -> float array -> float array -> bool * float * float

Random numbers, PDF, and CDF

module Rnd: sig .. end
module Pdf: sig .. end
module Cdf: sig .. end