IIHE  |  Education  |  Getting Started With Epi Info  |  Using StatCalc

Run StatCalc from the main EPI menu. The menu has the following choices:

StatcalcTablesStart.jpg (40149 bytes)

To make a selection, move the highlight the selection with the arrow keys and press Enter or press the letter that is highlighted, such as T for Tables. Help screens containing information about the program and the statistical methods are displayed by pressing F1.

Analyzing Using the Tables (2-by-2 to 2-by-10)

This choice presents a 2-by-2 table on the screen. To do statistics for a single 2-by-2 table, you simply enter four numbers in the table and then press F4 to perform the calculations. StatCalc calculates the odds ratio and relative risk with confidence limits, three types of chi- square tests, and, if appropriate, Fisher's exact probability calculation. Also, you may choose to do exact confidence limits. This is most useful if the numbers in the table are small.

This allows you to analyze exposure to diseases. For example, let's take a look at lung cancer and smoking. If we had a sampling where 5 people had lung cancer out 30 that smoked, and 2 people had lung cancer out of 57 that didn't smoke, we can use the 2-by-2 table to calculate the probabilities that smoking is a factor in contracting the disease.

To do this, you enter the data into a table. In the left column are the number of people that contracted the disease while in the right column are the number of people who did not. In the upper row, you put the number of people that were exposed to the disease and in the lower row, you put the number of people that weren't exposed.

So, for our example, you enter 5 for the number of people that have the disease and were exposed in the upper-left box. In the upper-right box, you enter 30 for the number of people that were exposed but don't have the disease. In the lower-left box, you put 2 for the number of people that have the disease but were not exposed and in the lower-right box, you put 57 for the number that were not exposed and didn't get the disease:

StatcalcTablesData.jpg (36531 bytes)

Now press F4 to do the calculation:

StatcalcTablesResults.jpg (127184 bytes)

As you can see, you get quite a bit of statistical analysis with even more available by press F2.
.

Sample Size and Power

If you choose "Sample Size & Power," a secondary menu is presented, with the following choices:

StatcalcCohortStart.jpg (42199 bytes)

All calculations are designed for studies in which the results are proportions expressed in percentages. In a population survey or descriptive study, finding the proportion of persons answering yes to a particular question is a typical goal. The program asks you to specify probabilities that your sample will predict the true situation in the population(s) being sampled and the amount of inaccuracy you are willing to tolerate, and then it calculates the sample size based on these assumptions.

These programs assume random sampling of the population, 100% participation, 100% accuracy in obtaining and recording answers, and other conditions that are seldom present in real life. If you expect 90% participation rather than 100%, you will need to multiply the sample size by 100/90 (11%) to obtain a more realistic estimate.

Calculated sample sizes are only a guide, and in many studies the number of available cases or the number of interviews that can be done within the study budget is the final deciding factor. These programs provide a useful way to obtain estimates against which other factors can be weighed. In the long run the assumptions about what result you will obtain in the study will be replaced by actual results, and the assumed confidence levels will be replaced by p values or confidence limits derived from the results. Until such hindsight is available, however, sample size calculations provide a degree of foresight.

Operation of the three calculations proceeds by filling in a confidence level, power, and assumptions about how close an estimate of the actual proportion, relative risk, or odds ratio you would like. For the latter, choose the value furthest from the real population value that you are willing to tolerate. For example, in the Prospective or Cross-Sectional Study, suppose that you choose 10% as the hypothetical proportion of "Yes" answers to a question in the unexposed persons. You may wish to enter 20% as the closest value you would be able to distinguish in the exposed persons, meaning that any value of 20% or over would give a p value in the final study of .05 or less. This corresponds to a relative risk of 2.0 or an odds ratio of 0.5. Any of these three values may be entered initially to obtain the same results.

StatcalcCohortData.jpg (120390 bytes)

After assumptions have been entered, pressing F4 calculates the results:

StatcalcCohortResults.jpg (109982 bytes)

The result for the exact assumptions entered will be highlighted, and a series of other values based on varying the assumptions will also be presented. The results can be printed or sent to a file as described below.

Chi Square for Trend

The test for trend is often used for dose-response studies and can also be used to test for trends with age, passage of time, or any ordered variable. The data can be stratified on other variables like age and sex to eliminate confounding from these variables.

The Extended Mantel-Haenszel chi square that is calculated reflects the departure of a linear trend from horizontal (i.e., no trend). If the associated p value is less than .05, there is 95% probability that a trend exists in the underlying population.

Data for analysis of trend by this method must be grouped according to an ordered numerical sequence. The simplest groups are arbitrary ones of 0, 1, 2, 3, etc.. These are the "observations" or "scores." Other scores might be the mid-points of groups, such as 0 for 0 glasses of water, 2 for 1-3 glasses, 4 for 3-5, etc. These are the groups over which the presence of trend will be measured. Strata are determined by the confounding variables; common strata would be age or age-sex groups. The following table may be used as an example.

Frequency of Cigarette Smoking Among Women with Myocardial Infarction
(MI) and Controls (C), Stratified by Age

Stratum   1 2 3 4 5
Group   25-29 30-34 35-39 40-44 45-49
Cig./day Score MI C MI C MI C MI C MI C
None 0 1 131 0 188 3 161 11 169 23 157
1-24 1 1 101 6 152 12 130 21 134 42 97
>24 2 4 51 15 83 22 65 39 68 34 52

From Schlesselman, JJ. Case-Control Studies. Oxford Univ. Press, NY, 1982, p. 205.

In this example you would enter Stratum 1 on the first screen:

StatcalcChiEntry.jpg (50022 bytes)

Then press F2 and enter Stratum 2 on the second screen. After entering all five strata in this way, you press F4 to see the results:

StatcalcChiResults.jpg (54688 bytes)

Printing or Sending Results to a File

To send calculations for the current screen to the printer, press F5. If you prefer to have output go to a file rather than to the printer, press F6 instead. The first time you press F6 a prompt will request a file name. Once the file is open, subsequent F5 commands will add material to the file rather than sending it to the printer. Pressing F6 again closes the file. If you give the name of an existing file for the F6 command, new material will be added to the existing file without harming its original contents.

Entering and Editing Data in a Database  |  Getting Started With Epi Info  |  Analyzing Data, Part 1


Copyright 1999-2004 by Henry Ford Health System
Last modified: 05/09/11