Module Owl_stats

module Owl_stats: sig .. end

Statistics: random number generators, PDF and CDF functions, and hypothesis tests.

The module includes some basic statistical functions such as mean, variance, skew, and etc. It also includes the following three submodules.

The Owl_stats.Rnd module provides random number generators of various distributions.

The Owl_stats.Pdf module provides a range of probability density/mass functions of different distributions.

The Owl_stats.Cdf module provides cumulative distribution functions.

Please refer to GSL documentation for details.

Randomisation functions

val seed : int -> unit

seed x sets x as seed for the internal random number generator.

val shuffle : 'a array -> 'a array

shuffle x return a new array of the shuffled x.

val choose : 'a array -> int -> 'a array

choose x n draw n samples from x without replecement.

val sample : 'a array -> int -> 'a array

sample x n draw n samples from x with replacement.

Basic statistical functions

val mean : ?w:float array -> float array -> float

val variance : ?w:float array -> ?mean:float -> float array -> float

val std : ?w:float array -> ?mean:float -> float array -> float

std x calculates the standard deviation of x.

val sem : ?w:float array -> ?mean:float -> float array -> float

sem x calculates the standard error of x, also referred to as standard error of the mean.

val absdev : ?w:float array -> ?mean:float -> float array -> float

val skew : ?w:float array -> ?mean:float -> ?sd:float -> float array -> float

val kurtosis : ?w:float array -> ?mean:float -> ?sd:float -> float array -> float

kurtosis x return the Pearson's kurtosis of x.

val central_moment : int -> float array -> float

val covariance : ?mean0:float -> ?mean1:float -> float array -> float array -> float

val correlation : float array -> float array -> float

val pearson_r : float array -> float array -> float

val kendall_tau : float array -> float array -> float

val spearman_rho : float array -> float array -> float

val autocorrelation : ?lag:int -> float array -> float

val median : float array -> float

median x returns the median of x.

val percentile : float array -> float -> float

percentile x p returns the p percentile of the data x. p is between 0. and 1. x does not need to be sorted.

val first_quartile : float array -> float

first_quartile x returns the first quartile of x, i.e., 25 percentiles.

val third_quartile : float array -> float

third_quartile x returns the third quartile of x, i.e., 75 percentiles.

val min : float array -> float

val max : float array -> float

val minmax : float array -> float * float

val min_i : float array -> float * int

val max_i : float array -> float * int

val minmax_i : float array -> float * int * float * int

val sort : ?inc:bool -> float array -> float array

val argsort : ?inc:bool -> float array -> int array

val rank : ?ties_strategy:[ `Average | `Max | `Min ] -> float array -> float array

Computes sample's ranks.

The ranking order is from the smallest one to the largest. For example rank [|54.; 74.; 55.; 86.; 56.|] returns [|1.; 4.; 2.; 5.; 3.|]. Note that the ranking starts with one!

ties_strategy controls which ranks are assigned to equal values:

`Average the average of ranks should be assigned to each value. Default.
`Min the minimum of ranks is assigned to each value.
`Max the maximum of ranks is assigned to each value.

val histogram : float array -> int -> int array

val ecdf : float array -> float array * float array

ecdf x returns (x',f) which are the empirical cumulative distribution function f of x at points x'. x' is just x sorted in increasing order with duplicates removed.

val z_score : mu:float -> sigma:float -> float array -> float array

val t_score : float array -> float array

val normlise_pdf : float array -> float array

MCMC: Markov Chain Monte Carlo

val metropolis_hastings : (float array -> float) -> float array -> int -> float array array

TODO: metropolis_hastings f p n is Metropolis-Hastings MCMC algorithm. f is pdf of the p

val gibbs_sampling : (float array -> int -> float) -> float array -> int -> float array array

TODO: gibbs_sampling f p n is Gibbs sampler. f is a sampler based on the full conditional function of all variables

Hypothesis tests

type tail =

`\|`	`BothSide`
`\|`	`RightSide`
`\|`	`LeftSide`	`(*`	Types of alternative hypothesis tests: one-side, left-side, or right-side.	`*)`

val z_test : mu:float ->
       sigma:float ->
       ?alpha:float -> ?side:tail -> float array -> bool * float * float

z_test ~mu ~sigma ~alpha ~side x returns a test decision for the null hypothesis that the data x comes from a normal distribution with mean mu and a standard deviation sigma, using the z-test of alpha significance level. The alternative hypothesis is that the mean is not mu.

The result h,p,z: h is true if the test rejects the null hypothesis at the alpha significance level, and false otherwise. p is the p-value and z is the z-score.

val t_test : mu:float ->
       ?alpha:float -> ?side:tail -> float array -> bool * float * float

t_test ~mu ~alpha ~side x returns a test decision of one-sample t-test which is a parametric test of the location parameter when the population standard deviation is unknown. mu is population mean, alpha is the significance level.

val t_test_paired : ?alpha:float ->
       ?side:tail -> float array -> float array -> bool * float * float

t_test_paired ~alpha ~side x y returns a test decision for the null hypothesis that the data in x – y comes from a normal distribution with mean equal to zero and unknown variance, using the paired-sample t-test.

val t_test_unpaired : ?alpha:float ->
       ?side:tail ->
       ?equal_var:bool -> float array -> float array -> bool * float * float

t_test_unpaired ~alpha ~side ~equal_var x y returns a test decision for the null hypothesis that the data in vectors x and y comes from independent random samples from normal distributions with equal means and equal but unknown variances, using the two-sample t-test. The alternative hypothesis is that the data in x and y comes from populations with unequal means.

equal_var indicates whether two samples have the same variance. If the two variances are not the same, the test is referred to as Welche's t-test.

val var_test : ?alpha:float ->
       ?side:tail -> var:float -> float array -> bool * float * float

var_test ~alpha ~side ~var x returns a test decision for the null hypothesis that the data in x comes from a normal distribution with variance var, using the chi-square variance test. The alternative hypothesis is that x comes from a normal distribution with a different variance.

val jb_test : ?alpha:float -> float array -> bool * float * float

jb_test ~alpha x returns a test decision for the null hypothesis that the data x comes from a normal distribution with an unknown mean and variance, using the Jarque-Bera test.

val fisher_test : ?alpha:float ->
       ?side:tail -> int -> int -> int -> int -> bool * float * float

fisher_test ~alpha ~side a b c d fisher's exact test for contingency table |a, b| |c, d| . The result h,p,z: h is true if the test rejects the null hypothesis at the alpha significance level, and false otherwise. p is the p-value and z is prior odds ratio.

val runs_test : ?alpha:float ->
       ?side:tail -> ?v:float -> float array -> bool * float * float

runs_test ~alpha ~v x returns a test decision for the null hypothesis that the data x comes in random order, against the alternative that they do not, by runnign Wald–Wolfowitz runs test. The test is based on the number of runs of consecutive values above or below the mean of x. ~v is the reference value, the default value is the median of x.

val mannwhitneyu : ?alpha:float ->
       ?side:tail -> float array -> float array -> bool * float * float

mannwhitneyu ~alpha ~side x y Computes the Mann-Whitney rank test on samples x and y. If length of each sample less than 10 and no ties, then using exact test (see paper Ying Kuen Cheung and Jerome H. Klotz (1997) The Mann Whitney Wilcoxon distribution using linked list Statistica Sinica 7 805-813), else usning asymptotic normal distribution.

val wilcoxon : ?alpha:float ->
       ?side:tail -> float array -> float array -> bool * float * float

Random numbers, PDF, and CDF

module Rnd: sig .. end

module Pdf: sig .. end

module Cdf: sig .. end