Use of consent criteria. Agreement criterion of agreement criteria in statistical innovative technologies

Antipyretics for children are prescribed by a pediatrician. But there are emergency situations for fever when the child needs to be given medicine immediately. Then the parents take responsibility and use antipyretic drugs. What is allowed to be given to infants? How can you lower the temperature in older children? What medications are the safest?

Criteria for agreement (compliance)

To test the hypothesis about the correspondence of the empirical distribution to the theoretical distribution law, special statistical indicators are used - goodness-of-fit criteria (or compliance criteria). These include the criteria of Pearson, Kolmogorov, Romanovsky, Yastremsky, etc. Most agreement criteria are based on the use of deviations of empirical frequencies from theoretical ones. Obviously, the smaller these deviations, the better the theoretical distribution corresponds to the empirical one (or describes it).

Consent criteria - these are criteria for testing hypotheses about the correspondence of the empirical distribution to the theoretical probability distribution. Such criteria are divided into two classes: general and special. General goodness-of-fit tests apply to the most general formulation of a hypothesis, namely the hypothesis that observed results agree with any a priori assumed probability distribution. Special goodness-of-fit tests involve special null hypotheses that state agreement with a particular form of probability distribution.

Agreement criteria, based on the established distribution law, make it possible to establish when discrepancies between theoretical and empirical frequencies should be considered insignificant (random), and when - significant (non-random). It follows from this that the agreement criteria make it possible to reject or confirm the correctness of the hypothesis put forward when aligning the series about the nature of the distribution in the empirical series and to answer whether it is possible to accept for a given empirical distribution a model expressed by some theoretical distribution law.

Pearson's χ 2 (chi-square) goodness-of-fit test is one of the main goodness-of-fit tests. Proposed by the English mathematician Karl Pearson (1857-1936) to assess the randomness (significance) of discrepancies between the frequencies of empirical and theoretical distributions:

Where k- the number of groups into which the empirical distribution is divided; fi- empirical frequency of a trait in i-th group; / ts °р - theoretical frequency of the sign in i-th group.

Scheme for applying the criterion y) to assess the consistency of the theoretical and empirical distributions comes down to the following.

  • 1. The calculated measure of discrepancy % 2 asc is determined.
  • 2. The number of degrees of freedom is determined.
  • 3. Based on the number of degrees of freedom v, %^bl is determined using a special table
  • 4. If % 2 asch >x 2 abl, then for a given level of significance a and the number of degrees of freedom v, the hypothesis about the insignificance (randomness) of the discrepancies is rejected. Otherwise, the hypothesis can be recognized as not contradicting the experimental data obtained and with probability (1 - a) it can be argued that the discrepancies between theoretical and empirical frequencies are random.

Significance level - this is the probability of erroneously rejecting the put forward hypothesis, i.e. the probability that a correct hypothesis will be rejected. In statistical studies, depending on the importance and responsibility of the problems being solved, the following three levels of significance are used:

  • 1) a = 0.1, then P = 0,9;
  • 2) a = 0.05, then P = 0,95;
  • 3) a = 0.01, then P = 0,99.

Using the goodness-of-fit criterion y), The following conditions must be met.

  • 1. The volume of the population under study must satisfy the condition p> 50, while the frequency or group size must be at least 5. If this condition is violated, it is necessary to first combine small frequencies (less than 5).
  • 2. The empirical distribution must consist of data obtained as a result of random sampling, i.e. they must be independent.

The disadvantage of the Pearson goodness-of-fit criterion is the loss of some of the original information associated with the need to group observation results into intervals and combine individual intervals with a small number of observations. In this regard, it is recommended to supplement the check of distribution compliance with the criterion y) other criteria. This is especially necessary when the sample size is P ~ 100.

In statistics, the Kolmogorov goodness-of-fit test (also known as the Kolmogorov–Smirnov goodness-of-fit test) is used to determine whether two empirical distributions obey the same law, or to determine whether a resulting distribution obeys an assumed model. The Kolmogorov criterion is based on determining the maximum discrepancy between accumulated frequencies or frequencies of empirical or theoretical distributions. The Kolmogorov criterion is calculated using the following formulas:

Where D And d- accordingly, the maximum difference between the accumulated frequencies (/-/") and between the accumulated frequencies ( rr") empirical and theoretical series of distributions; N- number of units in the aggregate.

Having calculated the value X, a special table is used to determine the probability with which it can be stated that deviations of empirical frequencies from theoretical ones are random. If the sign takes values ​​up to 0.3, then this means that there is a complete coincidence of frequencies. With a large number of observations, the Kolmogorov test is able to detect any deviation from the hypothesis. This means that any difference in the sample distribution from the theoretical one will be detected with its help if there are a sufficiently large number of observations. The practical significance of this property is insignificant, since in most cases it is difficult to count on obtaining a large number of observations under constant conditions, the theoretical idea of ​​the distribution law to which the sample should obey is always approximate, and the accuracy of statistical tests should not exceed the accuracy of the selected model.

The Romanovsky goodness-of-fit test is based on the use of the Pearson criterion, i.e. already found values ​​x 2 > and the number of degrees of freedom:

where v is the number of degrees of freedom of variation.

The Romanovsky criterion is convenient in the absence of tables for x2. If K r TO? > 3, then they are non-random and the theoretical distribution cannot serve as a model for the empirical distribution being studied.

B. S. Yastremsky used in the criterion of agreement not the number of degrees of freedom, but the number of groups ( k), a special value 0, depending on the number of groups, and a chi-square value. The Yastremsky agreement criterion has the same meaning as the Romanovsky criterion and is expressed by the formula

where x 2 is Pearson's goodness-of-fit test; /e gr - number of groups; 0 - coefficient, for the number of groups less than 20 equal to 0.6.

If 1ph act > 3, the discrepancies between theoretical and empirical distributions are not random, i.e. the empirical distribution does not meet the requirements of a normal distribution. If 1f act

Theoretical and empirical frequencies. Checking for normal distribution

When analyzing variation distribution series, it is of great importance how empirical distribution sign corresponds normal. To do this, the frequencies of the actual distribution must be compared with the theoretical ones, which are characteristic of a normal distribution. This means that, based on actual data, it is necessary to calculate the theoretical frequencies of the normal distribution curve, which are a function of normalized deviations.

In other words, the empirical distribution curve needs to be aligned with the normal distribution curve.

Objective characteristics of compliance theoretical And empirical frequencies can be obtained using special statistical indicators called consent criteria.

Concordance criterion called a criterion that allows you to determine whether the discrepancy is empirical And theoretical distributions are random or significant, i.e. whether the observational data agree with the put forward statistical hypothesis or do not agree. The distribution of the population, which it has due to the hypothesis put forward, is called theoretical.

There is a need to install criterion(rule) that would allow one to judge whether the discrepancy between the empirical and theoretical distributions is random or significant. If the discrepancy is random, then they believe that the observational data (sample) are consistent with the hypothesis put forward about the law of distribution of the general population and, therefore, the hypothesis is accepted; if the discrepancy turns out to be significant, then the observational data do not agree with the hypothesis and it is rejected.

Typically, empirical and theoretical frequencies differ because:

    the discrepancy is random and due to a limited number of observations;

    the discrepancy is not accidental and is explained by the fact that the statistical hypothesis that the population is normally distributed is erroneous.

Thus, consent criteria make it possible to reject or confirm the correctness of the hypothesis put forward when aligning the series about the nature of the distribution in the empirical series.

Empirical Frequencies obtained as a result of observation. Theoretical frequencies calculated by formulas.

For normal distribution law they can be found as follows:

    Σƒ i- sum of accumulated (cumulative) empirical frequencies

    h - difference between two neighboring options

    σ - sample standard deviation

    t–normalized (standardized) deviation

    φ(t)–probability density function of normal distribution (found from the table of values ​​of the local Laplace function for the corresponding value of t)

There are several goodness-of-fit tests, the most common of which are: chi-square test (Pearson), Kolmogorov test, Romanovsky test.

Pearson χ goodness-of-fit test 2 – one of the main ones, which can be represented as the sum of the ratios of the squares of the differences between theoretical (f T) and empirical (f) frequencies to theoretical frequencies:

    k is the number of groups into which the empirical distribution is divided,

    f i – observed frequency of the trait in the i-th group,

    f T – theoretical frequency.

For the χ 2 distribution, tables have been compiled that indicate the critical value of the χ 2 goodness-of-fit criterion for the selected significance level α and degrees of freedom df (or ν). The significance level α is the probability of erroneously rejecting the proposed hypothesis, i.e. the probability that a correct hypothesis will be rejected. R - statistical significance accepting the correct hypothesis. In statistics, three levels of significance are most often used:

α=0.10, then P=0.90 (in 10 cases out of 100)

α=0.05, then P=0.95 (in 5 cases out of 100)

α=0.01, then P=0.99 (in 1 case out of 100) the correct hypothesis can be rejected

The number of degrees of freedom df is defined as the number of groups in the distribution series minus the number of connections: df = k –z. The number of connections is understood as the number of indicators of the empirical series used in calculating theoretical frequencies, i.e. indicators connecting empirical and theoretical frequencies. For example, when aligned with a bell curve, there are three relationships. Therefore, when aligned by bell curve the number of degrees of freedom is defined as df =k–3. To assess significance, the calculated value is compared with the table χ 2 table

If the theoretical and empirical distributions completely coincide, χ 2 =0, otherwise χ 2 >0. If χ 2 calc > χ 2 tab, then for a given level of significance and number of degrees of freedom, we reject the hypothesis about the insignificance (randomness) of the discrepancies. If χ 2 calc< χ 2 табл то гипотезу принимаем и с вероятностью Р=(1-α) можно утверждать, что расхождение между теоретическими и эмпирическими частотами случайно. Следовательно, есть основания утверждать, что эмпирическое распределение подчиняетсяnormal distribution. Pearson's goodness-of-fit test is used if the population size is large enough (N>50), and the frequency of each group must be at least 5.

Kolmogorov goodness-of-fit test is based on determining the maximum discrepancy between the accumulated empirical and theoretical frequencies:

where D and d are, respectively, the maximum difference between the accumulated frequencies and the accumulated frequencies of the empirical and theoretical distributions. Using the distribution table of the Kolmogorov statistics, the probability is determined, which can vary from 0 to 1. When P(λ) = 1, there is a complete coincidence of frequencies, P(λ) = 0 - a complete discrepancy. If the probability value P is significant in relation to the found value λ, then we can assume that the discrepancies between the theoretical and empirical distributions are insignificant, that is, they are random. The main condition for using the Kolmogorov criterion is a sufficiently large number of observations.

Kolmogorov goodness-of-fit test

Let us consider how the Kolmogorov criterion (λ) is applied when testing the hypothesis of normal distribution the general population. Aligning the actual distribution with the bell curve consists of several steps:

    Compare actual and theoretical frequencies.

    Based on actual data, the theoretical frequencies of the normal distribution curve, which is a function of the normalized deviation, are determined.

    They check to what extent the distribution of the characteristic corresponds to normal.

For IV column of the table:

In MS Excel, the normalized deviation (t) is calculated using the NORMALIZATION function. It is necessary to select a range of free cells by the number of options (spreadsheet rows). Without removing the selection, call the NORMALIZE function. In the dialog box that appears, indicate the following cells, which contain, respectively, the observed values ​​(X i), average (X) and standard deviation Ϭ. The operation must be completed simultaneous by pressing Ctrl+Shift+Enter

For the V column of the table:

The probability density function of the normal distribution φ(t) is found from the table of values ​​of the local Laplace function for the corresponding value of the normalized deviation (t)

For VI column of the table:

Kolmogorov goodness-of-fit test (λ) determined by dividing the module max difference between empirical and theoretical cumulative frequencies by the square root of the number of observations:

Using a special probability table for the agreement criterion λ, we determine that the value λ = 0.59 corresponds to a probability of 0.88 (λ

Distribution of empirical and theoretical frequencies, probability density of theoretical distribution

When applying goodness-of-fit tests to check whether the observed (empirical) distribution corresponds to the theoretical one, one should distinguish between testing simple and complex hypotheses.

The one-sample Kolmogorov-Smirnov normality test is based on maximum difference between the cumulative empirical distribution of the sample and the estimated (theoretical) cumulative distribution. If the Kolmogorov-Smirnov D statistic is significant, then the hypothesis that the corresponding distribution is normal should be rejected.

Criteria for checking randomness and assessing outlier observations Literature Introduction In the practice of statistical analysis of experimental data, the main interest is not the calculation of certain statistics itself, but the answers to questions of this type. Accordingly, many criteria have been developed to test the proposed statistical hypotheses. All criteria for testing statistical hypotheses are divided into two large groups: parametric and non-parametric.


Share your work on social networks

If this work does not suit you, at the bottom of the page there is a list of similar works. You can also use the search button


Test

Using Consent Criteria

Introduction

Literature

Introduction

In the practice of statistical analysis of experimental data, the main interest is not the calculation of certain statistics itself, but the answers to questions of this type. Is the population mean really equal to a certain number? Is the correlation coefficient significantly different from zero? Are the variances of the two samples equal? And many such questions may arise, depending on the specific research problem. Accordingly, many criteria have been developed to test the proposed statistical hypotheses. We will consider some of the most common ones. These will mainly relate to means, variances, correlation coefficients and abundance distributions.

All criteria for testing statistical hypotheses are divided into two large groups: parametric and non-parametric. Parametric tests are based on the assumption that the sample data are drawn from a population with a known distribution, and the main task is to estimate the parameters of this distribution. Nonparametric tests do not require any assumptions about the nature of the distribution, other than the assumption that it is continuous.

Let's look at the parametric criteria first. The test sequence will include the formulation of the null hypothesis and the alternative hypothesis, the formulation of the assumptions to be made, the determination of the sample statistics used in the test and, the formation of the sample distribution of the statistics being tested, the determination of the critical regions for the selected criterion, and the construction of a confidence interval for the sample statistics.

1 Goodness-of-fit criteria for means

Let the hypothesis being tested be that the population parameter. The need for such a check may arise, for example, in the following situation. Suppose that, based on extensive research, the diameter of the shell of a fossil mollusk in sediments from some fixed location has been established. Let us also have at our disposal a certain number of shells found in another place, and we make the assumption that a specific place does not affect the diameter of the shell, i.e. that the average value of the shell diameter for the entire population of mollusks that once lived in a new place is equal to the known value obtained earlier when studying this type of mollusk in the first habitat.

If this known value is equal, then the null hypothesis and the alternative hypothesis are written as follows: Let us assume that the variable x in the population under consideration has a normal distribution, and the value of the population variance is unknown.

We will test the hypothesis using statistics:

, (1)
where is the sample standard deviation.

It was shown that if true, then t in expression (1) has a Student t-distribution with n-1 degrees of freedom. If you choose the significance level (the probability of rejecting the correct hypothesis) to be equal, then, in accordance with what was discussed in the previous chapter, you can determine the critical values ​​​​for testing =0.

In this case, since the Student distribution is symmetrical, then (1-) part of the area under the curve of this distribution with n-1 degrees of freedom will be contained between the points and, which are equal to each other in absolute value. Therefore, all values ​​less than a negative and greater than a positive value for a t-distribution with a given number of degrees of freedom at a chosen significance level will constitute the critical region. If the sample t value falls within this region, the alternative hypothesis is accepted.

The confidence interval for is constructed using the previously described method and is determined from the following expression

(2)

So, let us know in our case that the diameter of the shell of a fossil mollusk is 18.2 mm. We had at our disposal a sample of 50 newly found shells, for which mm, a = 2.18 mm. Let's check: =18.2 against We have

If the significance level is chosen =0.05, then the critical value. It follows that it can be rejected in favor at the significance level =0.05. Thus, for our hypothetical example, it can be argued (with some probability, of course) that the diameter of the shell of fossil mollusks of a certain species depends on the places in which they lived.

Due to the fact that the t-distribution is symmetrical, only positive t values ​​of this distribution are given at the selected significance levels and number of degrees of freedom. Moreover, not only the share of the area under the distribution curve to the right of the t value is taken into account, but also to the left of the -t value at the same time. This is due to the fact that in most cases, when testing hypotheses, we are interested in the significance of deviations in themselves, regardless of whether these deviations are larger or smaller, i.e. we check against, not against: >a or:

Let's return now to our example. The 100(1-)% confidence interval for is

18,92,01

Let us now consider the case when it is necessary to compare the means of two general populations. The hypothesis being tested looks like this: : =0, : 0. It is also assumed that it has a normal distribution with a mean and variance, and - a normal distribution with a mean and the same variance. In addition, we assume that the samples from which the general populations are estimated are extracted independently of each other and have a volume, respectively, and From the independence of the samples it follows that if we take a larger number of them and calculate the average values ​​for each pair, then the set of these pairs of averages will be completely uncorrelated.

Null hypothesis testing is done using statistics

(3)

where and are variance estimates for the first and second samples, respectively. It is easy to see that (3) is a generalization of (1).

It was shown that statistics (3) have a Student t-distribution with degrees of freedom. If and are equal, i.e. = = formula (3) is simplified and has the form

(4)

Let's look at an example. Let us assume that when measuring the stem leaves of the same plant population over two seasons, the following results are obtained: We assume that the conditions for using the Student’s t-test, i.e. the normality of the populations from which the samples are taken, the existence of an unknown but the same variance for these populations, and the independence of the samples are satisfied. Let us estimate at the significance level =0.01. We have

Table value t = 2.58. Therefore, the hypothesis about the equality of the average values ​​of stem leaf lengths for a plant population over two seasons should be rejected at the chosen level of significance.

Attention! The null hypothesis in mathematical statistics is the hypothesis that there are no significant differences between the compared indicators, regardless of whether we are talking about means, variances or other statistics. And in all these cases, if the empirical (calculated by formula) value of the criterion is greater than the theoretical (selected from the tables), it is rejected. If the empirical value is less than the tabulated value, then it is accepted.

In order to construct a confidence interval for the difference between the means of these two populations, let us pay attention to the fact that the Student’s test, as can be seen from formula (3), evaluates the significance of the difference between the means relative to the standard error of this difference. It is easy to verify that the denominator in (3) represents exactly this standard error using the previously discussed relationships and assumptions made. In fact, we know that in general

If x and y are independent, then so are

Taking sample values ​​and instead of x and y, and recalling the assumption made that both populations have the same variance, we obtain

(5)

The variance estimate can be obtained from the following relation

(6)

(We divide by because two quantities are estimated from the samples and, therefore, the number of degrees of freedom must be reduced by two.)

If we now substitute (6) into (5) and take the square root, we get the denominator in expression (3).

After this digression, let's return to constructing a confidence interval for through -.

We have

Let us make some comments related to the assumptions used in constructing the t-test. First of all, it was shown that violations of the assumption of normality for have an insignificant effect on the level of significance and power of the test for 30. Violations of the assumption of homogeneity of variances of both populations from which the samples are taken are also insignificant, but only in the case when the sample sizes are equal. If the variances of both populations differ from each other, then the probabilities of errors of the first and second types will differ significantly from those expected.

In this case, the criterion should be used to check

(7)

with the number of degrees of freedom

. (8)

As a rule, it turns out to be a fractional number, therefore, when using t-distribution tables, it is necessary to take the table values ​​​​for the nearest integer values ​​and interpolate to find the t corresponding to the obtained one.

Let's look at an example. When studying two subspecies of the marsh frog, the ratio of body length to tibia length was calculated. Two samples were taken with volumes =49 and =27. The mean and variance of the ratio of interest to us turned out to be =2.34, respectively; =2.08; =0.21; =0.35. If we now test the hypothesis using formula (2), we obtain that

At a significance level of =0.05, we must reject the null hypothesis (tabular value t=1.995) and assume that there are statistically significant differences at the chosen significance level between the mean values ​​of the measured parameters for the two frog subspecies.

When using formulas (6) and (7) we have

In this case, for the same significance level =0.05, the table value t=2.015, and the null hypothesis is accepted.

This example clearly shows that the neglect of the conditions adopted in the derivation of one or another criterion can lead to results that are directly opposite to those that actually take place. Of course, in this case, having samples of different sizes in the absence of a predetermined fact that the variances of the measured indicator in both populations are statistically equal, formulas (7) and (8) should have been used, which showed the absence of statistically significant differences.

Therefore, I would like to repeat once again that verification of compliance with all the assumptions made when deriving a particular criterion is an absolutely necessary condition for its correct use.

The constant requirement in both of the above modifications of the t-test was the requirement that the samples be independent of each other. However, in practice there are often situations when this requirement cannot be met for objective reasons. For example, some indicators are measured on the same animal or area of ​​territory before and after the action of an external factor, etc. And in these cases we may be interested in testing the hypothesis against. We will continue to assume that both samples are drawn from normal populations with the same variance.

In this case, we can take advantage of the fact that differences between normally distributed quantities also have a normal distribution, and therefore we can use the Student's t test in the form (1). Thus, the hypothesis will be tested that n differences are a sample from a normally distributed population with a mean equal to zero.

Denoting the i-th difference by, we have

, (9)
Where

Let's look at an example. Let us have at our disposal data on the number of impulses of an individual nerve cell during a certain time interval before () and after () the action of the stimulus:

Hence, keeping in mind that (9) has a t-distribution, and choosing a significance level of =0.01, from the corresponding table in the Appendix we find that the critical value of t for n-1=10-1=9 degrees of freedom is 3.25. A comparison of the theoretical and empirical t-statistic values ​​shows that the null hypothesis of no statistically significant differences between firing rates before and after the stimulus should be rejected. It can be concluded that the stimulus used statistically significantly changes the frequency of impulses.

In experimental studies, as mentioned above, dependent samples appear quite often. However, this fact is sometimes ignored and the t-test is used incorrectly in form (3).

This can be seen as invalid by considering the standard errors of the difference between uncorrelated and correlated means. In the first case

And in the second

The standard error of the difference d is

Taking this into account, the denominator in (9) will have the form

Now let us pay attention to the fact that the numerators of expressions (4) and (9) coincide:

therefore, the difference in the value of t in them depends on the denominators.

Thus, if formula (3) is used in a problem with dependent samples, and the samples have a positive correlation, then the resulting t values ​​will be less than they should be when using formula (9), and a situation may arise where that the null hypothesis will be accepted when it is false. The opposite situation may arise when there is a negative correlation between samples, i.e. in this case, differences will be recognized as significant that in fact are not.

Let's return again to the example with impulse activity and calculate the t value for the given data using formula (3), not paying attention to the fact that the samples are related. We have: For the number of degrees of freedom equal to 18, and the significance level = 0.01, the table value is t = 2.88 and, at first glance, it seems that nothing happened, even when using a formula that is unsuitable for the given conditions. And in this case, the calculated t value leads to the rejection of the null hypothesis, i.e. to the same conclusion that was made using formula (9), correct in this situation.

However, let's reformat the existing data and present it in the following form (2):

These are the same values, and they could well be obtained in one of the experiments. Since all values ​​in both samples are preserved, using the Student's t test in formula (3) gives the previously obtained value = 3.32 and leads to the same conclusion that has already been made.

Now let’s calculate the value of t using formula (9), which should be used in this case. We have: The critical value of t at the selected significance level and nine degrees of freedom is 3.25. Consequently, we have no reason to reject the null hypothesis; we accept it, and it turns out that this conclusion is directly opposite to that which was made when using formula (3).

Using this example, we were once again convinced of how important it is to strictly comply with all the requirements that were the basis for determining a particular criterion in order to obtain correct conclusions when analyzing experimental data.

The considered modifications of the Student's test are intended to test hypotheses regarding the average of two samples. However, situations arise when it becomes necessary to draw conclusions regarding the equality of k averages at the same time. For this case, a certain statistical procedure has also been developed, which will be discussed later when discussing issues related to analysis of variance.

2 Goodness-of-fit tests for variances

The testing of statistical hypotheses regarding the variances of general populations is carried out in the same sequence as for the means. Let us briefly recall this sequence.

1. A null hypothesis is formulated (about the absence of statistically significant differences between the compared variances).

2. Some assumptions are made regarding the sampling distribution of statistics, with the help of which it is planned to estimate the parameter included in the hypothesis.

3. The significance level for testing the hypothesis is selected.

4. The value of the statistics of interest to us is calculated and a decision is made regarding the truth of the null hypothesis.

And now let's start by testing the hypothesis that the variance of the general population = a, i.e. against. If we assume that the variable x has a normal distribution, and that a sample of size n is randomly drawn from the population, then statistics are used to test the null hypothesis

(10)

Remembering the formula for calculating dispersion, we rewrite (10) as follows:

. (11)

From this expression it is clear that the numerator is the sum of the squares of the deviations of normally distributed values ​​from their mean. Each of these deviations is also normally distributed. Therefore, in accordance with the distribution known to us, the sums of squares of normally distributed values ​​of statistics (10) and (11) have a -distribution with n-1 degrees of freedom.

By analogy with the use of the t-distribution, when checking for the selected significance level, critical points are established from the distribution table, corresponding to the probabilities of accepting the null hypothesis and. The confidence interval for at selected is constructed as follows:

. (12)

Let's look at an example. Let us assume, on the basis of extensive experimental research, that the dispersion of the alkaloid content of one plant species from a certain area is equal to 4.37 conventional units. The specialist has at his disposal a sample of n = 28 such plants, presumably from the same area. The analysis showed that for this sample =5.01 and it is necessary to make sure that this and previously known variances are statistically indistinguishable at the significance level =0.1.

According to formula (10) we have

The resulting value must be compared with the critical values ​​/2=0.05 and 1--/2=0.95. From the Appendix table for with 27 degrees of freedom we have 40.1 and 16.2, respectively, which means that the null hypothesis can be accepted. The corresponding confidence interval for is 3.37<<8,35.

In contrast to testing hypotheses regarding sample means using the Student's test, when errors of the first and second types did not change significantly when the assumption of normal distribution of populations was violated, in the case of hypotheses about variances when the conditions of normality were not met, the errors changed significantly.

The problem considered above about the equality of the variance to some fixed value is of limited interest, since situations are quite rare when the variance of the population is known. Of much greater interest is the case when you need to check whether the variances of two populations are equal, i.e. testing a hypothesis against an alternative. It is assumed that samples of size and are randomly drawn from general populations with variances and.

To test the null hypothesis, Fisher's variance ratio test is used

(13)

Since the sums of squared deviations of normally distributed random variables from their means have a distribution, both the numerator and denominator of (13) are distributed values ​​divided by and respectively, and therefore their ratio has an F-distribution with -1 and -1 degrees of freedom.

It is generally accepted - and this is how F-distribution tables are constructed - that the largest of the variances is taken as the numerator in (13), and therefore only one critical point is determined, corresponding to the selected significance level.

Let us have at our disposal two samples of volume =11 and =28 from populations of common and oval pond snails, for which the height-to-width ratios have variances =0.59 and =0.38. It is necessary to test the hypothesis about the equality of these variances of these indicators for the populations being studied at a significance level of =0.05. We have

In the literature, you can sometimes find a statement that testing the hypothesis about the equality of means using the Student's test should be preceded by testing the hypothesis about the equality of variances. This is the wrong recommendation. Moreover, it can lead to mistakes that can be avoided if not followed.

Indeed, the results of testing the hypothesis of equality of variances using Fisher's test largely depend on the assumption that the samples are drawn from populations with a normal distribution. At the same time, the Student's test is insensitive to violations of normality, and if it is possible to obtain samples of equal size, then the assumption of equality of variances is also not significant. In the case of unequal n, formulas (7) and (8) should be used for verification.

When testing hypotheses about equality of variances, some features arise in calculations associated with dependent samples. In this case, statistics are used to test a hypothesis against an alternative

(14)

If the null hypothesis is true, then statistics (14) has a Student t-distribution with n-2 degrees of freedom.

When measuring the gloss of 35 coating samples, a dispersion of =134.5 was obtained. Repeated measurements two weeks later showed =199.1. In this case, the correlation coefficient between paired measurements turned out to be equal to =0.876. If we ignore the fact that the samples are dependent and use the Fisher test to test the hypothesis, we get F=1.48. If you choose the significance level =0.05, then the null hypothesis will be accepted, since the critical value of the F-distribution for =35-1=34 and =35-1=34 degrees of freedom is 1.79.

At the same time, if we use formula (14) suitable for this case, we obtain t = 2.35, while the critical value of t for 33 degrees of freedom and the selected significance level = 0.05 is equal to 2.03. Therefore, the null hypothesis of equal variances in the two samples should be rejected. Thus, from this example it is clear that, as in the case of testing the hypothesis of equality of means, the use of a criterion that does not take into account the specifics of experimental data leads to an error.

In the recommended literature you can find the Bartlett test, which is used to test hypotheses about the simultaneous equality of k variances. In addition to the fact that calculating the statistics of this criterion is quite laborious, the main disadvantage of this criterion is that it is unusually sensitive to deviations from the assumption of normal distribution of the populations from which samples are drawn. Thus, when using it, you can never be sure that the null hypothesis is actually rejected because the variances are statistically significantly different, and not because the samples are not normally distributed. Therefore, if the problem of comparing several variances arises, it is necessary to look for a formulation of the problem where it will be possible to use the Fisher criterion or its modifications.

3 Criteria for agreement regarding shares

Quite often it is necessary to analyze populations in which objects can be classified into one of two categories. For example, by gender in a certain population, by the presence of a certain trace element in the soil, by the dark or light color of eggs in some species of birds, etc.

We denote the proportion of elements that have a certain quality by P, where P represents the ratio of objects with the quality we are interested in to all objects in the aggregate.

Let us test the hypothesis that in some sufficiently large population the share P is equal to some number a (0

For dichotomous (having two gradations) variables, as in our case, P plays the same role as the average of the population of variables measured quantitatively. On the other hand, it was previously stated that the standard error of the fraction P can be represented as

Then, if the hypothesis is true, then the statistics

, (19)
where p is the sample P value, has a unit normal distribution. It should be noted right away that such an approximation is valid if the lesser of the products np or (1-p)n is greater than 5.

Let it be known from the literature that in the lake frog population the proportion of individuals with a longitudinal stripe on the back is 62% or 0.62. We had at our disposal a sample of 125 (n) individuals, 93 (f) of which have a longitudinal stripe on the back. It is necessary to find out whether the proportion of individuals with the trait we are interested in in the population from which the sample was taken corresponds to the known data. We have: p=f/n=93/125=0.744, a=0.62, n(1-p)=125(1-0.744)=32>5 and

Therefore, for both the significance level = 0.05 and = 0.01, the null hypothesis should be rejected, since the critical value for = 0.05 is 1.96, and for = 0.01 - 2.58.

If there are two large populations in which the proportions of objects with the property we are interested in are respectively and, then it is of interest to test the hypothesis: = against the alternative:. For testing, two samples with volumes and are extracted randomly and independently. Based on these samples, statistics are estimated and determined.

(20)

where and is the number of objects possessing this characteristic, respectively, in the first and second samples.

From formula (20) it can be understood that in its derivation the same principle was used that we encountered earlier. Namely, to test statistical hypotheses, the number of standard deviations that make up the difference between the indicators of interest to us is determined; in fact, the value (+)/(+) represents the proportion of objects with a given characteristic in both samples simultaneously. If we denote it by, then the expression in the second bracket of the denominator (20) represents (1-) and it becomes obvious that expression (20) is equivalent to the formula for testing the null hypothesis:

Because.

On the other hand, it's a standard error. Thus, (20) can be written as

. (21)

The only difference between this statistic and the statistic used in testing hypotheses about means is that z has a unit normal distribution rather than a t-distribution.

Let the study of a group of people (=82) show that the proportion of people who have a -rhythm in their electroencephalogram is 0.84 or 84%. A study of a group of people in another area (=51) found this proportion to be 0.78. For a significance level of =0.05, it is necessary to check that the proportions of individuals with brain alpha activity in the general populations from which the samples were taken are the same.

First of all, let us make sure that the available experimental data allow us to use statistics (20). We have:

and since z has a normal distribution, for which the critical point at =0.05 is 1.96, then the null hypothesis is accepted.

The considered criterion is valid if the samples for which the proportions of objects with the characteristic we are interested in were compared are independent. If this requirement is not met, for example, when a population is considered in successive time intervals, then the same object may or may not have this characteristic in these intervals.

Let us denote the presence of an object of some attribute of interest to us by 1, and its absence by 0. Then we come to table 3, where (a+c) is the number of objects in the first sample that have some attribute, (a+c) is the number of objects with this characteristic in the second sample, and n is the total number of objects examined. Obviously, this is already a well-known four-field table, the relationship in which is assessed using the coefficient

For such a table and small (<10) значений в каждой клетке Р.Фишером было найдено точное распределение для, которое позволяет проверять гипотезу: =. Это распределение имеет довольно сложный вид, и его критические точки приводятся в специальных таблицах. В реальных ситуациях, как правило, значения в каждой клетке больше 10, и было показано, что в этих случаях для проверки нулевой гипотезы можно использовать статистику

(22)
which, if the null hypothesis is true, has a chi-square distribution with one degree of freedom.

Let's look at an example. Let the effectiveness of malaria vaccinations given at different times of the year be tested over the course of two years. The hypothesis is tested that the effectiveness of vaccinations does not depend on the time of year when they are given. We have

The table value for =0.05 is 3.84, and for =0.01 is 6.64. Therefore, at any of these significance levels, the null hypothesis should be rejected, and in this hypothetical example (however related to reality), it can be concluded that bets made in the second half of the year are significantly more effective.

A natural generalization of the coupling coefficient for a four-field table is, as mentioned earlier, Chuprov’s mutual conjugation coefficient. The exact distribution for this coefficient is unknown, so the validity of the hypothesis is judged by comparing the calculated value and the selected significance level with the critical points for this distribution. The number of degrees of freedom is determined from the expression (r-1)(c-1), where r and c are the number of gradations for each of the characteristics.

Let us recall the calculation formulas

The data obtained from studying the range of vision in the right and left eyes in people without visual anomalies are presented. Conventionally, this range is divided into four categories, and we are interested in the reliability of the relationship between the visual range of the left and right eyes. First, let's find all the terms in the double sum. To do this, the square of each value given in the table is divided by the sum of the row and column to which the selected number belongs. We have

Using this value we get =3303.6 and T=0.714.

4 Criteria for comparing population distributions

In the classic pea breeding experiments that marked the beginning of genetics, G. Mendel observed the frequencies of different types of seeds obtained by crossing plants with round yellow seeds and wrinkled green seeds.

In this and similar cases, it is of interest to test the null hypothesis about the equality of the distribution functions of the general populations from which the samples are drawn, i.e. Theoretical calculations have shown that statistics can be used to solve such a problem

= (23)

The criterion using this statistics was proposed by K. Pearson and bears his name. The Pearson test is used for grouped data regardless of whether it has a continuous or discrete distribution. In (23), k is the number of grouping intervals, is the empirical numbers, and is the expected or theoretical numbers (=n). If the null hypothesis is true, statistics (23) has a distribution with k-1 degrees of freedom.

For the data given in the table

The critical points of the distribution with 3 degrees of freedom for =0.05 and =0.01 are equal to 7.81 and 11.3, respectively. Therefore, the null hypothesis is accepted and the conclusion is drawn that segregation in the offspring corresponds quite well to theoretical patterns.

Let's look at another example. In a colony of guinea pigs, the following numbers of male births were obtained during the year by month, starting from January: 65, 64, 65, 41, 72, 80, 88, 114, 80, 129, 112, 99. Can we consider that the data obtained correspond to a uniform distribution, i.e. distribution in which the number of males born in individual months is on average the same? If we accept this hypothesis, then the expected average number of males born will be equal. Then

The critical value of a distribution with 11 degrees of freedom and = 0.01 is 24.7, so at the chosen significance level the null hypothesis is rejected. Further analysis of experimental data shows that the likelihood of male guinea pigs being born in the second half of the year increases.

In the case where the theoretical distribution is assumed to be uniform, there are no problems with calculating theoretical numbers. In the case of other distributions, the calculations become more complicated. Let's look at examples of how theoretical numbers are calculated for normal and Poisson distributions, which are quite common in research practice.

Let's start by determining the theoretical numbers for the normal distribution. The idea is to transform our empirical distribution into a distribution with zero mean and unit variance. Naturally, in this case, the boundaries of class intervals will be expressed in units of standard deviation, and then, remembering that the area under the section of the curve limited by the upper and lower values ​​of each interval is equal to the probability of falling into a given interval, multiplying this probability by the total number sampling we will obtain the desired theoretical number.

Suppose we have an empirical distribution for the length of oak leaves and we need to check whether it can be considered with a significance level of =0.05 that this distribution does not differ significantly from normal.

Let us explain how the values ​​given in the table were calculated. First, using the standard method for grouped data, the mean and standard deviation were calculated, which turned out to be equal to =10.3 and =2.67. Using these values, the boundaries of the intervals were found in units of standard deviation, i.e. standardized values ​​have been found. For example, for the boundaries of the interval (46) we have: (4-10.3)/2.67=-2.36; (6-10.3)/2.67=-1.61. Then, for each interval, the probability of falling into it was calculated. For example, for the interval (-0.110.64) from the normal distribution table, we have that to the left of the point (-0.11) there is 0.444 of the area of ​​the unit normal distribution, and to the left of the point (0.64) there is 0.739 of this area. Thus, the probability of falling into this interval is 0.739-0.444=0.295. The rest of the calculations are obvious. The difference between n and... should be explained. It arises due to the fact that the theoretical normal distribution can be considered, for practical purposes, to be centered on an interval. In the experiment, there are no values ​​deviating more than from the average. Therefore, the area under the empirical distribution curve is not equal to unity, due to which an error arises. However, this error does not significantly change the final results.

When comparing empirical and theoretical distributions, the number of degrees of freedom for the -distribution is found from the relation f=m-1-l, where m is the number of class intervals, and l is the number of independent distribution parameters estimated from the sample. For a normal distribution l=2, since it depends on two parameters: and.

The number of degrees of freedom is also reduced by 1, since for any distribution there is a condition that =1, and therefore, the number of independently determined probabilities is equal to k-1, and not k.

For the given example, f = 8-2-1 = 5 and the critical value at =0.05 for the -distribution with 5 degrees of freedom is 11.07. Therefore, the null hypothesis is accepted.

Let us consider the technique of comparing the empirical distribution with the Poisson distribution using a classic example of the number of deaths of dragoons per month in the Prussian army from a horse’s hoof. The data dates back to the 19th century, and the number of deaths is 0, 1, 2, etc. characterize these sad, but fortunately relatively rare events in the Prussian cavalry over almost 20 years of observation.

As is known, the Poisson distribution has the following form:

where is the distribution parameter equal to the average,

K =0,1,2,...,n.

Since the distribution is discrete, the probabilities we are interested in are found directly from the formula.

Let us show, for example, how the theoretical number for k=3 is determined. In the usual way we find that the mean in this distribution is 0.652. Given this value, we find

From here

If =0.05 is chosen, then the critical value for the -distribution with two degrees of freedom is 5.99, and therefore the hypothesis that the empirical distribution at the chosen significance level is not different from the Poisson one is accepted. The number of degrees of freedom in this case is two, because the Poisson distribution depends on one parameter, and therefore, in the relation f = m-1-l, the number of parameters estimated from the sample is l = 1, and f = 4-1-1 = 2.

Sometimes in practice it turns out to be important to know whether two distributions are different from each other, even if it is difficult to decide which theoretical distribution can approximate them. This is especially important in cases where, for example, their means and/or variances do not differ statistically significantly from each other. Finding significant differences in distribution patterns can help the researcher make predictions about possible factors that lead to these differences.

In this case, statistics (23) can be used, and the values ​​of one distribution are used as empirical quantities, and the values ​​of another as theoretical ones. Naturally, in this case, the division into class intervals should be the same for both distributions. This means that for all data from both samples, the minimum and maximum values ​​are selected, regardless of which sample they belong to, and then, in accordance with the selected number of class intervals, their width is determined and the number of objects falling into separate intervals is calculated for each sample separately .

In this case, it may turn out that some classes do not contain or only a few (35) values ​​fall into them. Using the Pearson criterion gives satisfactory results if at least 35 values ​​fall into each interval. Therefore, if this requirement is not met, adjacent intervals must be merged. Of course, this is done for both distributions.

And finally, one more note regarding the comparison of the calculated value and the critical points for it at the selected significance level. We already know that if >, then the null hypothesis is rejected. However, values ​​close to the critical point 1- on the right should arouse our suspicions, because such a too good coincidence of the empirical and theoretical distributions or two empirical distributions (after all, in this case the numbers will differ very slightly from each other) is unlikely to occur for random distributions. In this case, two alternative explanations are possible: either we are dealing with a law, and then the result obtained is not surprising, or the experimental data, for some reason, are “fitted” to each other, which requires their re-verification.

By the way, in the example with peas we have exactly the first case, i.e. the appearance of seeds of different smoothness and color in the offspring is determined by law, and therefore it is not surprising that the calculated value turned out to be so small.

Now let's return to testing the statistical hypothesis about the identity of two empirical distributions. Data are presented on the distribution of the number of petals of anemone flowers taken from different habitats.

From the tabular data it is clear that the first two and last two intervals must be combined, since the number of values ​​falling into them is not enough for the correct use of the Pearson criterion. From this example it is also clear that if only the distribution from habitat A were analyzed, then there would be no class-interval containing 4 petals at all. It appeared as a result of the fact that two distributions are considered simultaneously, and in the second distribution there is such a class.

So, let's check the hypothesis that these two distributions do not differ from each other. We have

For a number of degrees of freedom of 4 and a significance level even equal to 0.001, the null hypothesis is rejected.

To compare two sample distributions, you can also use the nonparametric criterion proposed by N.V. Smirnov and based on statistics introduced earlier by A.N. Kolmogorov. (This is why this test is sometimes called the Kolmogorov-Smirnov test.) This test is based on a comparison of series of accumulated frequencies. The statistics of this criterion are found as

max, (24)
where and are the distribution curves of accumulated frequencies.

The critical points for statistics (24) are found from the relation

, (25)
where and are the volumes of the first and second samples.

Critical values ​​for =0.1;=0.05; and =0.01 are equal to 1.22, respectively; 1.36; 1.63. Let us illustrate the use of the Smirnov criterion using grouped data representing the height of schoolchildren of the same age from two different regions.

The maximum difference between the accumulated frequency curves is 0.124. If we choose the significance level =0.05, then from formula (25) we have

0,098.

Thus, the maximum empirical difference is greater than the theoretically expected one, therefore, at the accepted level of significance, the null hypothesis about the identity of the two distributions under consideration is rejected.

The Smirnov test can also be used for non-clustered data, the only requirement is that the data must be drawn from a population with a continuous distribution. It is also desirable that the number of values ​​in each sample be at least 40-50.

To test the null hypothesis, according to which two independent samples of sizes n and m correspond to the same distribution functions, F. Wilcoxon proposed a nonparametric criterion, which was substantiated in the works of G. Mann and F. Whitney. Therefore, in the literature this criterion is called either the Wilcoxon criterion or the Mann-Whitney criterion. This criterion is advisable to use when the sample sizes obtained are small and the use of other criteria is inappropriate.

The calculations below illustrate the approach to constructing criteria that use statistics associated not with the sample values ​​themselves, but with their ranks.

Let us have two samples of sizes n and m values ​​at our disposal. Let us construct a general variation series from them, and compare each of these values ​​with its rank (), i.e. the serial number it occupies in the ranked series. If the null hypothesis is true, then any distribution of ranks is equally probable, and the total number of possible combinations of ranks for given n and m is equal to the number of combinations of N=n+m elements by m.

Wilcoxon test is based on statistics

. (26)

Formally, to test the null hypothesis, it is necessary to count all possible combinations of ranks for which the W statistic takes values ​​equal to or less than that obtained for a specific ranked series, and find the ratio of this number to the total number of possible combinations of ranks for both samples. Comparing the obtained value with the selected significance level will allow you to accept or reject the null hypothesis. The rationale behind this approach is that if one distribution is biased relative to another, it will manifest itself in the fact that small ranks should correspond mainly to one sample, and large ones to another. Depending on this, the corresponding rank sums should be small or large depending on which alternative occurs.

It is necessary to test the hypothesis about the identity of the distribution functions characterizing both measurement methods with a significance level of =0.05.

In this example n = 3, m = 2, N = 2+3 = 5, and the sum of the ranks corresponding to measurements using method B is 1+3 = 4.

Let us write down all =10 possible distributions of ranks and their sums:

Ranks: 1.2 1.3 1.4 1.5 2.3 2.4 2.5 3.4 3.5 4.5

Amounts: 3 4 5 6 5 6 7 7 8 9

The ratio of the number of rank combinations, the sum of which does not exceed the obtained value of 4 for method B, to the total number of possible rank combinations is 2/10=0.2>0.05, so for this example the null hypothesis is accepted.

For small values ​​of n and m, the null hypothesis can be tested by directly counting the number of combinations of the corresponding rank sums. However, for large samples this becomes practically impossible, so an approximation was obtained for the W statistic, which, as it turned out, asymptotically tends to the normal distribution with the appropriate parameters. We will calculate these parameters to illustrate the approach to synthesizing rank-based statistical tests. In doing so, we will use the results presented in Chapter 37.

Let W be the sum of ranks corresponding to one of the samples, for example, the one with volume m. Let be the arithmetic mean of these ranks. The mathematical expectation of the value is

since under the null hypothesis the ranks of elements in a sample of size m represent a sample from a finite population 1, 2,...,N (N=n+m). It is known that

That's why.

When calculating the variance, we take advantage of the fact that the sum of the squares of the ranks of the general ranked series, composed of the values ​​of both samples, is equal to

Taking into account the previously obtained relations for estimating the variances of general populations and samples, we have

It follows that

It has been shown that statistics

(27)

for large n and m it has an asymptotically unit normal distribution.

Let's look at an example. Let data on the polarographic activity of blood serum filtrate be obtained for two age groups. It is necessary to test the hypothesis with a significance level of =0.05 that the samples are taken from general populations that have the same distribution functions. The sum of ranks for the first sample is 30, for the second - 90. Checking the correctness of calculating the sums of ranks is the fulfillment of the condition. In our case, 30+90=(7+8)(7+8+1):

:2=120. According to formula (27), using the sum of ranks of the second sample, we have

If we use the sum of ranks for the first sample, we get the value = -3.01. Since the calculated statistics have a unit normal distribution, it is natural that in both the first and second cases the null hypothesis is rejected, since the critical value for the 5% significance level is modulo 1.96.

When using the Wilcoxon test, certain difficulties arise when the same values ​​are found in both samples, since the use of the above formula leads to a decrease in the power of the test, sometimes very significantly.

In order to reduce errors to a minimum in such cases, it is advisable to use the following rule of thumb. The first time when identical values ​​belonging to different samples are encountered, which of them to put first in the variation series is determined randomly, for example, by tossing a coin. If there are several such values, then, having determined the first one by chance, the remaining equal values ​​from both samples are alternated. In those cases where other equal values ​​are found, do this. If in the first group of equal values ​​the first value was randomly selected from one particular sample, then in the next group of equal values ​​the value from another sample is selected first, etc.

5.Criteria for checking randomness and evaluating outlier observations

Quite often, data is acquired in series across time or space. For example, in the process of conducting psychophysiological experiments, which can last several hours, several tens or hundreds of times, the latent (latent period) of the reaction to a presented visual stimulus is measured, or in geographical surveys, when on sites located in certain places, for example, along the edge of the forests, the number of plants of a certain type is counted, etc. On the other hand, when calculating various statistics, it is assumed that the source data are independent and identically distributed. Therefore, it is of interest to test this assumption.

First, consider a criterion for testing the null hypothesis of independence of identically normally distributed values. Thus, this criterion is parametric. It is based on calculating the mean squares of successive differences

. (28)

If we introduce new statistics, then, as is known from theory, if the null hypothesis is true, the statistics

(29)
for n>10 is distributed asymptotically according to the standard normal distribution.

Let's look at an example. The reaction times () of the subject in one of the psychophysiological experiments are given.

We have: from where

Since for =0.05 the critical value is 1.96, the null hypothesis about the independence of the resulting series is accepted with the selected significance level.

Another question that often arises when analyzing experimental data is what to do with some observations that differ sharply from the bulk of observations. Such outlier observations can occur due to methodological errors, calculation errors, etc. In all cases where the experimenter knows that an error has crept into the observation, he must exclude this value, regardless of its magnitude. In other cases, there is only a suspicion of error, and then it is necessary to use appropriate criteria in order to make a particular decision, i.e. exclude or leave outlier observations.

In general, the question is posed as follows: are the observations made on the same population, or do some parts or individual values ​​belong to a different population?

Of course, the only reliable way to exclude individual observations is to carefully study the conditions under which these observations were obtained. If for some reason the conditions differed from the standard ones, then the observations should be excluded from further analysis. But in certain cases the existing criteria, although imperfect, can be of significant benefit.

We will present here, without proof, several relationships that can be used to test the hypothesis that observations are made by chance on the same population. We have

(30)

(31)

(32)

where is the suspected “outlier” observation. If all the values ​​of a series are ranked, then the most prominent observation in it will occupy the nth place.

For statistics (30), the distribution function is tabulated. The critical points of this distribution for some n are given.

The critical values ​​for statistics (31) depending on n are

4,0; 6

4,5; 100

5.0; n>1000.

Formula (31) assumes that and are calculated without taking into account the suspected observation.

With statistics (32), the situation is more complicated. It is shown that if they are distributed uniformly, then the mathematical expectation and variance have the form:

The critical region is formed by small values ​​that correspond to large values. If you are interested in checking for an “outlier” of the smallest value, then first transform the data so that they have a uniform distribution over the interval, and then take the addition of these uniform values ​​to 1 and check using formula (32).

Consider using the above criteria for the following ranked series of observations: 3,4,5,5,6,7,8,9,9,10,11,17. You need to decide whether the highest value 17 should be rejected.

We have: According to formula (30) =(17-11)/3.81=1.57, and the null hypothesis should be accepted at =0.01. According to the formula (31) = (17-7.0)/2.61 = 3.83, and the null hypothesis should also be accepted. To use the third criterion, we find =5.53, then

The w statistic is normally distributed with zero mean and unit variance, and hence the null hypothesis at =0.05 is accepted.

The difficulty of using statistics (32) is the need to have a priori information about the distribution law of sample values, and then analytically transform this distribution into a uniform distribution over the interval.

Literature

1. Eliseeva I.I. General theory of statistics: textbook for universities / I.I. Eliseeva, M.M. Yuzbashev; edited by I.I. Eliseeva. M.: Finance and Statistics, 2009. 656 p.

2. Efimova M.R. Workshop on the general theory of statistics: textbook for universities / M.R. Efimova and others M.: Finance and Statistics, 2007. 368 p.

3. Melkumov Y.S. Socio-economic statistics: educational and methodological manual. M.: IMPE-PUBLISH, 2007. 200 p.

4. General theory of statistics: Statistical methodology in the study of commercial activity: textbook for universities / O.E. Bashina and others; edited by O.E. Bashina, A.A. Spirina. - M.: Finance and Statistics, 2008. 440 p.

5. Salin V.N. A course in the theory of statistics for training specialists in financial and economic profiles: textbook / V.N. Salin, E.Yu. Churilova. M.: Finance and Statistics, 2007. 480 p.

6. Socio-economic statistics: workshop: textbook / V.N. Salin et al.; edited by V.N. Salina, E.P. Shpakovskaya. M.: Finance and Statistics, 2009. 192 p.

7. Statistics: textbook / A.V. Bagat et al.; edited by V.M. Simchers. M.: Finance and Statistics, 2007. 368 p.

8. Statistics: textbook / I.I. Eliseeva and others; edited by I.I. Eliseeva. M.: Higher Education, 2008. - 566 p.

9. Theory of statistics: textbook for universities / R.A. Shmoilova and others; edited by R.A. Shmoilova. - M.: Finance and Statistics, 2007. 656 p.

10. Shmoilova R.A. Workshop on the theory of statistics: textbook for universities / R.A. Shmoilova and others; edited by R.A. Shmoilova. - M.: Finance and Statistics, 2007. 416 p.

PAGE \* MERGEFORMAT 1

Other similar works that may interest you.vshm>

17926. Analysis of compactness criteria for industrial robotics 1.77 MB
Software solutions for assessing the compactness of a robot. Miniature robots can penetrate and move through narrow openings, which allows them to be used to perform various tasks in confined spaces, such as small-diameter pipes measuring a few millimeters in size. In almost all industries, the issues of miniaturization of actuators and mechanisms are among the priorities; they are of utmost importance for low-resource technological processes...
1884. Development of criteria for effective personnel management at OJSC Kazan-Orgsintez for QMS 204.77 KB
Basic theoretical aspects of the personnel management system. Personnel as an object of management. Research methods for personnel management systems for QMS. Ways to improve the efficiency of personnel management.
16316. and this theory resolves this dilemma; b the resolution of this dilemma requires the existence of criteria for this theory. 12.12 KB
The author argues that the fundamental reason for the macroeconomic policy dilemma under conditions of a fixed exchange rate is not a violation of Tinbergen’s rule, which is in fact a consequence and not a cause, but the absence of the necessary economic prerequisites for fixing the exchange rate as presented in the theory of optimal currency zones. The reason for this dilemma is usually considered to be a violation of the Tinbergen rule, according to which, in order to achieve a certain number of economic goals, the state must have...
18273. Analysis of the legal status of the President of the Republic of Kazakhstan from the standpoint of generally accepted criteria of the rule of law and the principle of separation of powers 73.64 KB
The essence of the President’s approach was that the country should develop in a natural, evolutionary way. Presidential rule - provided for by the Constitution of the state, this is the cessation of the activities of self-government institutions of a certain regional administrative entity and the implementation of management of the latter through authorized persons appointed by the head of state - the president and persons accountable to him; provided for by the Constitution, the vesting of the head of state - the president - with emergency powers on a global scale...
5713. Using DotNetNuke 1.87 MB
In this course work we will study DotNetNuke. DotNetNuke (abbreviated name DNN) is a website content management system (Web Content Management System, abbreviated WCMS), which has absorbed all the best achievements in the field of technologies for building web projects.
7073. USING INTERFACES 56.59 KB
The word interface is a polysemantic word, and it has different meanings in different contexts. There is the concept of a software or hardware interface, but in most cases the word interface is associated with some kind of connection between objects or processes.
6471. Register structure and use 193.04 KB
The structure and use of registers Registers are designed to store and convert multi-bit binary numbers. Registers are constructed as an ordered sequence of flip-flops. In microprocessors, registers are the main means for quick memorization and storage of digital information. The elements from which the registers are built are D RS JK flip-flops with dynamic cutoff or static control.
6472. Structure and use of counters 318.58 KB
Classification and principle of construction of asynchronous counters A counter is a device at the outputs of which a binary code is formed expressing the number of pulses received at the input of the counter. The number of possible counter states is called its modulus or counting factor and is denoted. The main time characteristics of counters: maximum frequency of arrival of counting pulses; time of transition from one state to another; There are actually counter microcircuits and circuits built on the basis of one or more ...
7066. USING THE MENU IN THE APPLICATION 240.2 KB
The program menu The program menu must correspond to the main modes of the program, therefore, the choice of menu items and commands of individual items must be treated with particular care. For a better understanding of the technology of using menus in programs, let's consider the sequence of actions when solving the following training program. All actions must be completed using the menu.
7067. USING THE DIALOG MENUS 73.13 KB
Continuing the development of an application with a menu and a toolbar, we need to write the message handler code for the commands to create a 6 * 6 matrix and display (print) the matrix in the client area of ​​​​our application. The creation of a matrix must be completed by displaying a message on the screen about the successful completion of the handler, for example, "The matrix has been created."

Introduction

The relevance of this topic is that during the study of the basics of biostatistics, we assumed that the law of distribution of the population was known. But what if the distribution law is unknown, but there is reason to assume that it has a certain form (let’s call it A), then the null hypothesis is tested: the population is distributed according to law A. This hypothesis is tested using a specially selected random variable - the goodness-of-fit criterion.

Goodness-of-fit criteria are criteria for testing hypotheses about the correspondence of the empirical distribution to the theoretical probability distribution. Such criteria are divided into two classes:

  • III General goodness-of-fit tests apply to the most general formulation of a hypothesis, namely the hypothesis that the observed results agree with any a priori assumed probability distribution.
  • Ш Special goodness-of-fit tests involve special null hypotheses that formulate agreement with a certain form of probability distribution.

Agreement criterion

The most common goodness-of-fit tests are omega-square, chi-square, Kolmogorov, and Kolmogorov-Smirnov.

Nonparametric goodness-of-fit tests Kolmogorov, Smirnov, and omega square are widely used. However, they are also associated with widespread errors in the application of statistical methods.

The fact is that the listed criteria were developed to test agreement with a fully known theoretical distribution. Calculation formulas, tables of distributions and critical values ​​are widely used. The main idea of ​​the Kolmogorov, omega square and similar tests is to measure the distance between the empirical distribution function and the theoretical distribution function. These criteria differ in the type of distances in the space of distribution functions.

Pearson goodness-of-fit tests for a simple hypothesis

K. Pearson's theorem applies to independent trials with a finite number of outcomes, i.e. to the Bernoulli trials (in a somewhat extended sense). It allows us to judge whether observations across a large number of trials of the frequencies of these outcomes are consistent with their estimated probabilities.

In many practical problems, the exact distribution law is unknown. Therefore, a hypothesis is put forward about the correspondence of the existing empirical law, constructed from observations, to some theoretical one. This hypothesis requires statistical testing, the results of which will either be confirmed or refuted.

Let X be the random variable under study. It is required to test the hypothesis H0 that this random variable obeys the distribution law F(x). To do this, it is necessary to make a sample of n independent observations and use it to construct an empirical distribution law F "(x). To compare the empirical and hypothetical laws, a rule called the goodness-of-fit criterion is used. One of the popular ones is the K. Pearson chi-square goodness-of-fit test. In it The chi-square statistic is calculated:

where N is the number of intervals according to which the empirical distribution law was constructed (the number of columns of the corresponding histogram), i is the number of the interval, pt i is the probability of the random variable value falling into the i-th interval for the theoretical distribution law, pe i is the probability of the random variable value falling in the i-th interval for the empirical distribution law. It should obey the chi-square distribution.

If the calculated value of the statistic exceeds the quantile of the chi-square distribution with k-p-1 degrees of freedom for a given significance level, then the hypothesis H0 is rejected. Otherwise, it is accepted at the specified significance level. Here k is the number of observations, p is the number of estimated parameters of the distribution law.

Let's look at the statistics:

The h2 statistic is called the Pearson chi-square statistic for a simple hypothesis.

It is clear that h2 represents the square of a certain distance between two r-dimensional vectors: the vector of relative frequencies (mi /n, ..., mr /n) and the vector of probabilities (pi, ..., pr). This distance differs from the Euclidean distance only in that different coordinates enter it with different weights.

Let us discuss the behavior of statistics h2 in the case when the hypothesis H is true and in the case when H is false. If H is true, then the asymptotic behavior of h2 for n > ? indicates the theorem of K. Pearson. To understand what happens to (2.2) when H is false, note that according to the law of large numbers mi /n > pi for n > ?, for i = 1, …, r. Therefore, for n > ?:

This value is equal to 0. Therefore, if H is incorrect, then h2 >? (for n > ?).

From the above it follows that H should be rejected if the value h2 obtained in the experiment is too large. Here, as always, the words “too large” mean that the observed value of h2 exceeds the critical value, which in this case can be taken from the chi-square distribution tables. In other words, the probability P(ch2 npi h2) is a small value and, therefore, it is unlikely to accidentally obtain the same as in the experiment, or an even greater discrepancy between the frequency vector and the probability vector.

The asymptotic nature of K. Pearson's theorem, which underlies this rule, requires caution in its practical use. It can only be relied upon for large n. It is necessary to judge whether n is large enough taking into account the probabilities pi, ..., pr. Therefore, it cannot be said, for example, that one hundred observations will be enough, since not only n should be large, but the products npi , ..., npr (expected frequencies) should not be small either. Therefore, the problem of approximating h2 (continuous distribution) to the h2 statistic, whose distribution is discrete, turned out to be difficult. A combination of theoretical and experimental arguments has led to the belief that this approximation is applicable if all expected frequencies npi>10. if the number r (the number of different outcomes) increases, the limit for is reduced (to 5 or even to 3 if r is of the order of several tens). To meet these requirements, in practice it is sometimes necessary to combine several outcomes, i.e. switch to Bernoulli scheme with smaller r.

The described method for checking agreement can be applied not only to Bernoulli tests, but also to random samples. First, their observations must be turned into Bernoulli tests by grouping. They do it this way: the observation space is divided into a finite number of non-overlapping regions, and then the observed frequency and the hypothetical probability are calculated for each region.

In this case, to the previously listed difficulties of approximation, one more is added - the choice of a reasonable partition of the original space. In this case, care must be taken to ensure that, in general, the rule for testing the hypothesis about the initial distribution of the sample is sufficiently sensitive to possible alternatives. Finally, I note that statistical criteria based on reduction to Bernoulli’s scheme, as a rule, are not consistent against all alternatives. So this method of checking consent is of limited value.

The Kolmogorov-Smirnov goodness-of-fit criterion in its classical form is more powerful than the h2 criterion and can be used to test the hypothesis about the correspondence of the empirical distribution to any theoretical continuous distribution F(x) with previously known parameters. The latter circumstance imposes restrictions on the possibility of wide practical application of this criterion when analyzing the results of mechanical tests, since the parameters of the distribution function of the characteristics of mechanical properties are usually estimated from the data of the sample itself.

The Kolmogorov-Smirnov criterion is used for ungrouped data or for grouped ones in the case of a small interval width (for example, equal to the scale division of a force meter, load cycle counter, etc.). Let the result of testing a series of n samples be a variation series of characteristics of mechanical properties

x1 ? x2 ? ...? xi? ...? xn. (3.93)

It is required to test the null hypothesis that the sampling distribution (3.93) belongs to the theoretical law F(x).

The Kolmogorov-Smirnov criterion is based on the distribution of the maximum deviation of the accumulated particular from the value of the distribution function. When used, statistics are calculated

which is the statistic of the Kolmogorov criterion. If the inequality holds

Dnvn? forehead (3.97)

for large sample sizes (n > 35) or

Dn(vn + 0.12 + 0.11/vn) ? forehead (3.98)

for n? 35, then the null hypothesis is not rejected.

If inequalities (3.97) and (3.98) are not met, an alternative hypothesis is accepted that the sample (3.93) belongs to an unknown distribution.

The critical values ​​of lb are: l0.1 = 1.22; l0.05 = 1.36; l0.01 = 1.63.

If the parameters of the function F(x) are not known in advance, but are estimated from sample data, the Kolmogorov-Smirnov criterion loses its universality and can only be used to check the compliance of experimental data with only some specific distribution functions.

When used as a null hypothesis that the experimental data belongs to a normal or lognormal distribution, statistics are calculated:

where Ц(zi) is the value of the Laplace function for

Ц(zi) = (xi - xср)/s The Kolmogorov-Smirnov criterion for any sample size n is written in the form

The critical values ​​of lb in this case are: л0.1 = 0.82; l0.05 = 0.89; l0.01 = 1.04.

If the hypothesis is tested that the sample corresponds to the *** exponential distribution, the parameter of which is estimated from experimental data, similar statistics are calculated:

criterion empirical probability

and make up the Kolmogorov-Smirnov criterion.

The critical values ​​of lb for this case are: λ0.1 = 0.99; l0.05 = 1.09; l0.01 = 1.31.

To test the hypothesis about the correspondence of the empirical distribution to the theoretical distribution law, special statistical indicators are used - goodness-of-fit criteria (or compliance criteria). These include the criteria of Pearson, Kolmogorov, Romanovsky, Yastremsky, etc. Most agreement criteria are based on the use of deviations of empirical frequencies from theoretical ones. Obviously, the smaller these deviations, the better the theoretical distribution corresponds to the empirical one (or describes it).

Consent criteria- these are criteria for testing hypotheses about the correspondence of the empirical distribution to the theoretical probability distribution. Such criteria are divided into two classes: general and special. General goodness-of-fit tests apply to the most general formulation of a hypothesis, namely, the hypothesis that observed results agree with any a priori assumed probability distribution. Special goodness-of-fit tests involve special null hypotheses that state agreement with a particular form of probability distribution.

Agreement criteria, based on the established distribution law, make it possible to establish when discrepancies between theoretical and empirical frequencies should be considered insignificant (random), and when - significant (non-random). It follows from this that the agreement criteria make it possible to reject or confirm the correctness of the hypothesis put forward when aligning the series about the nature of the distribution in the empirical series and to answer whether it is possible to accept for a given empirical distribution a model expressed by some theoretical distribution law.

Pearson goodness-of-fit test c 2 (chi-square) is one of the main criteria for agreement. Proposed by the English mathematician Karl Pearson (1857-1936) to assess the randomness (significance) of discrepancies between the frequencies of empirical and theoretical distributions:

The scheme for applying criterion c 2 to assessing the consistency of theoretical and empirical distributions comes down to the following:

1. The calculated measure of discrepancy is determined.

2. The number of degrees of freedom is determined.

3. Based on the number of degrees of freedom n, using a special table, is determined.

4. If , then for a given significance level α and the number of degrees of freedom n, the hypothesis about the insignificance (randomness) of the discrepancies is rejected. Otherwise, the hypothesis can be recognized as not contradicting the experimental data obtained and with probability (1 – α) it can be argued that the discrepancies between theoretical and empirical frequencies are random.

Significance level is the probability of erroneous rejection of the put forward hypothesis, i.e. the probability that a correct hypothesis will be rejected. In statistical studies, depending on the importance and responsibility of the problems being solved, the following three levels of significance are used:

1) a = 0.1, then R = 0,9;

2) a = 0.05, then R = 0,95;

3) a = 0.01, then R = 0,99.

Using the goodness-of-fit criterion c 2 , the following conditions must be met:

1. The volume of the studied population should be large enough ( N≥ 50), while the frequency or group size must be at least 5. If this condition is violated, it is necessary to first combine small frequencies (less than 5).

2. The empirical distribution must consist of data obtained as a result of random sampling, i.e. they must be independent.

The disadvantage of the Pearson goodness-of-fit criterion is the loss of some of the original information associated with the need to group observation results into intervals and combine individual intervals with a small number of observations. In this regard, it is recommended to supplement the check of distribution compliance according to the criterion with 2 other criteria. This is especially necessary with a relatively small sample size ( n ≈ 100).

In statistics Kolmogorov goodness-of-fit test(also known as the Kolmogorov-Smirnov goodness-of-fit test) is used to determine whether two empirical distributions obey the same law, or to determine whether a resulting distribution obeys an assumed model. The Kolmogorov criterion is based on determining the maximum discrepancy between accumulated frequencies or frequencies of empirical or theoretical distributions. The Kolmogorov criterion is calculated using the following formulas:

Where D And d- accordingly, the maximum difference between the accumulated frequencies ( ff¢) and between accumulated frequencies ( pp¢) empirical and theoretical series of distributions; N- the number of units in the aggregate.

Having calculated the value of λ, a special table is used to determine the probability with which it can be stated that deviations of empirical frequencies from theoretical ones are random. If the sign takes values ​​up to 0.3, then this means that there is a complete coincidence of frequencies. With a large number of observations, the Kolmogorov test is able to detect any deviation from the hypothesis. This means that any difference in the sample distribution from the theoretical one will be detected with its help if there are a sufficiently large number of observations. The practical significance of this property is not significant, since in most cases it is difficult to count on obtaining a large number of observations under constant conditions, the theoretical idea of ​​the distribution law to which the sample should obey is always approximate, and the accuracy of statistical tests should not exceed the accuracy of the selected model.

Romanovsky's goodness-of-fit criterion based on the use of the Pearson criterion, i.e. already found values ​​of c 2, and the number of degrees of freedom:

where n is the number of degrees of freedom of variation.

The Romanovsky criterion is convenient in the absence of tables for. If< 3, то расхождения распределений случайны, если же >3, then they are not random and the theoretical distribution cannot serve as a model for the empirical distribution being studied.

B. S. Yastremsky used in the criterion of agreement not the number of degrees of freedom, but the number of groups ( k), a special value of q, depending on the number of groups, and a chi-square value. Yastremski's goodness-of-fit test has the same meaning as the Romanovsky criterion and is expressed by the formula

where c 2 is Pearson's goodness-of-fit criterion; - number of groups; q - coefficient, for the number of groups less than 20, equal to 0.6.

If L fact > 3, the discrepancies between theoretical and empirical distributions are not random, i.e. the empirical distribution does not meet the requirements of a normal distribution. If L fact< 3, расхождения между эмпирическим и теоретическим распределениями считаются случайными.



Support the project - share the link, thank you!
Read also
Postinor analogues are cheaper Postinor analogues are cheaper The second cervical vertebra is called The second cervical vertebra is called Watery discharge in women: norm and pathology Watery discharge in women: norm and pathology