stats

Reference API related to statistical functions

Bootstrap estimates

bootstrap_sampling

 bootstrap_sampling (data:pandas.core.frame.DataFrame,
                     estimator:Callable=<function mean at 0x7fb085310df0>,
                     n_boot:int=1000, columns_to_exclude:List[str]=None)

Compute bootstrap estimates of the data distribution

	Type	Default	Details
data	DataFrame		Data containing the columns we want to generate bootstrap estimates from.
estimator	typing.Callable	mean	estimator function that accepts an array-like argument.
n_boot	int	1000	Number of bootstrap estimates to compute.
columns_to_exclude	typing.List[str]	None	Column names to exclude.

Usage:

Generate data with columns containing data that we want to compute estimates from. The values in the column a comes from Normal distribution with mean 0 and standard deviation 1. The values from column b comes from Normal distribution with mean 100 and standard deviation 10.

data = pd.DataFrame(
    data={
        "a": np.random.normal(size = 100), 
        "b": np.random.normal(loc=100, scale = 10, size = 100)
    }
)
data.head()

	a	b
0	0.605639	92.817505
1	-0.775791	92.750026
2	-1.265231	107.981771
3	0.981306	101.388385
4	0.029075	122.700172

Compute mean of the distribution by default

By default, the function generates the mean of each column n_boot times. Each value represents the mean obtained from a bootstrap sample of the original data.

estimates = bootstrap_sampling(data, n_boot=100)
estimates

	a	b
0	0.012356	100.018394
1	0.143189	100.691872
2	-0.002554	99.874399
3	0.079395	99.539636
4	0.055096	100.452383
...	...	...
95	0.063409	100.439363
96	-0.024455	98.607045
97	0.209427	99.866736
98	0.061323	98.680469
99	0.289456	99.980295

100 rows × 2 columns

We can check if the estimates make sense by compute the mean of the bootstrap estimates and comparing with the mean of the Normal distribution they were generated from.

estimates.mean()

a      0.089538
b    100.099900
dtype: float64

Specify function. Example: Standard deviation.

We can specify other functions, such as np.std to compute the standard deviation.

estimates = bootstrap_sampling(data, estimator=np.std, n_boot=100)
estimates

	a	b
0	0.933496	10.126658
1	0.929125	9.852667
2	0.899762	10.307814
3	0.968039	10.416074
4	1.004349	10.441463
...	...	...
95	0.910904	10.357727
96	0.818276	12.358640
97	0.981826	9.622724
98	0.962237	10.897055
99	0.913994	11.096338

100 rows × 2 columns

If we take the mean of the bootstrap estimates of the standard deviation, we should recover a value close to the standard deviation of the distribution that the data were generated from.

estimates.mean()

a     0.943942
b    10.480457
dtype: float64

Exclude unwanted columns

estimates = bootstrap_sampling(
    data, n_boot=100, columns_to_exclude=["b"]
)
estimates

	a
0	0.259128
1	0.098232
2	0.087111
3	-0.131376
4	0.050997
...	...
95	0.129835
96	-0.004873
97	-0.046338
98	0.246239
99	0.355848

100 rows × 1 columns

source

compute_evaluation_estimates

 compute_evaluation_estimates (df:pandas.core.frame.DataFrame,
                               n_boot:int=1000,
                               estimator:Callable=<function mean at
                               0x7fb085310df0>, quantile_low:float=0.025,
                               quantile_high=0.975)

Compute estimate and confidence interval for evaluation per query metrics.

	Type	Default	Details
df	DataFrame		Evaluations per query data, usually obtained pyvespa evaluate method.
n_boot	int	1000	Number of bootstrap samples.
estimator	typing.Callable	mean	estimator function that accepts an array-like argument.
quantile_low	float	0.025	lower quantile to compute confidence interval
quantile_high	float	0.975	upper quantile to compute confidence interval

Usage:

Generate sample data frame, which must contain the column model.

number_data_points = 1000
data = pd.DataFrame(
    data = {
        "model": (
            ["A"] * number_data_points + 
            ["B"] * number_data_points
        ),
        "query_id": (
            list(range(number_data_points)) + 
            list(range(number_data_points))
        ),
        "metric_1": (
            np.random.binomial(size=number_data_points, n=1, p=0.3).tolist() + 
            np.random.binomial(size=number_data_points, n=1, p=0.7).tolist()
        ),
        "metric_2": (
            np.random.binomial(size=number_data_points, n=1, p=0.1).tolist() + 
            np.random.binomial(size=number_data_points, n=1, p=0.9).tolist()
        )
        
    }
).sort_values("query_id").reset_index(drop=True)
data

	model	query_id	metric_1	metric_2
0	A	0	0	0
1	B	0	1	1
2	A	1	0	1
3	B	1	1	1
4	A	2	0	0
...	...	...	...	...
1995	A	997	1	0
1996	B	998	1	1
1997	A	998	1	0
1998	A	999	0	0
1999	B	999	0	1

2000 rows × 4 columns

Compute the confidence interval of the mean by default

compute_evaluation_estimates(data)

	metric	model	low	median	high
0	metric_1	A	0.268000	0.296	0.325
1	metric_1	B	0.667000	0.696	0.724
2	metric_2	A	0.091000	0.109	0.129
3	metric_2	B	0.887975	0.907	0.924

Specify function. Example: Standard deviation.

compute_evaluation_estimates(data, estimator=np.std)

	metric	model	low	median	high
0	metric_1	A	0.442918	0.456491	0.468375
1	metric_1	B	0.448001	0.459983	0.470931
2	metric_2	A	0.289026	0.311639	0.335200
3	metric_2	B	0.264998	0.291829	0.315366

Specify interval coverage

compute_evaluation_estimates(
    data, 
    quantile_low=0.2, 
    quantile_high=0.8
)

	metric	model	low	median	high
0	metric_1	A	0.285	0.296	0.308
1	metric_1	B	0.684	0.696	0.708
2	metric_2	A	0.102	0.110	0.118
3	metric_2	B	0.898	0.906	0.914

compute_evaluation_estimates(data[["model", "metric_1", "metric_2"]])

	metric	model	low	median	high
0	metric_1	A	0.269975	0.297	0.326000
1	metric_1	B	0.667975	0.696	0.726000
2	metric_2	A	0.091000	0.109	0.129025
3	metric_2	B	0.888000	0.907	0.923000