bootstrap_sampling (data:pandas.core.frame.DataFrame,
estimator:Callable=<function mean at 0x7fb085310df0>,
n_boot:int=1000, columns_to_exclude:List[str]=None)
Compute bootstrap estimates of the data distribution
Type
Default
Details
data
DataFrame
Data containing the columns we want to generate bootstrap estimates from.
estimator
typing.Callable
mean
estimator function that accepts an array-like argument.
n_boot
int
1000
Number of bootstrap estimates to compute.
columns_to_exclude
typing.List[str]
None
Column names to exclude.
Usage:
Generate data with columns containing data that we want to compute estimates from. The values in the column a comes from Normal distribution with mean 0 and standard deviation 1. The values from column b comes from Normal distribution with mean 100 and standard deviation 10.
By default, the function generates the mean of each column n_boot times. Each value represents the mean obtained from a bootstrap sample of the original data.
We can check if the estimates make sense by compute the mean of the bootstrap estimates and comparing with the mean of the Normal distribution they were generated from.
estimates.mean()
a 0.089538
b 100.099900
dtype: float64
Specify function. Example: Standard deviation.
We can specify other functions, such as np.std to compute the standard deviation.
If we take the mean of the bootstrap estimates of the standard deviation, we should recover a value close to the standard deviation of the distribution that the data were generated from.
compute_evaluation_estimates (df:pandas.core.frame.DataFrame,
n_boot:int=1000,
estimator:Callable=<function mean at
0x7fb085310df0>, quantile_low:float=0.025,
quantile_high=0.975)
Compute estimate and confidence interval for evaluation per query metrics.
Type
Default
Details
df
DataFrame
Evaluations per query data, usually obtained pyvespa evaluate method.
n_boot
int
1000
Number of bootstrap samples.
estimator
typing.Callable
mean
estimator function that accepts an array-like argument.
quantile_low
float
0.025
lower quantile to compute confidence interval
quantile_high
float
0.975
upper quantile to compute confidence interval
Usage:
Generate sample data frame, which must contain the column model.