![]() |
Analyzing Data, Part 2 |
|
IIHE | Education | Getting Started With Epi Info | Analyzing Data, Part 2 |
||
This section how to use many the commands.
Selecting Data
When you start the Analysis module, you need to read a database to work on before you can perform any analysis. The file may be either an Epi Info file, produced by entering data with the Enter program, or a dBase file from another source. Epi Info files can be produced from many other formats using the Import module.
The command that tells Analysis what file to use is Read, and this is usually the first command given in Analysis. If you know the name of the file you want to use, you can put the name of the file after Read. Otherwise, type Read and press Enter:
Now, you can use the arrow keys to highlight the file you want to use and then press the Enter key to load it. To see all the dBase files, type Read *.DBF. Folder (directory) names end with a \ in the listing. Selecting them shows the files in the folder. To back up a folder, choose the ..\.
You can narrow the records used in a database by using the Select command. You enter an expression that must be true for the record to be included. If you wanted to only process the records of patients that are over 64, you would enter Select AGE > 64. If the field is a text field then the value must be enclosed in quote, e.g., Select SEX = "F". You can include multiple conditions by using "and" or "or", e.g., Select age > 64 and sex = "F".
To clear a previous Select statement, just enter Select and press Enter.
Listing Data
The first step in data analysis is to scan the data visually to gain an overall impression and see what further analysis might be appropriate. A "line listing" is helpful for this purpose. To produce a listing of the records in the file, type List and press Enter:
List only displays as many variables as will fit across the screen. If you want to see all the variables, use List *. List followed by one or more variable names lists only those variables. You can exclude fields by including Not before their name: List * Not Name Address.
Looking At Frequencies
The frequencies command, Freq, counts each category for a specified variable and gives the absolute and relative frequencies for each category. Typing Freq Sex produces:
The number in each category is given first, followed by the percentage of the total and the cumulative percentage. If Statistics is Set to On and the field types are numeric, the sum, mean, and standard deviation are also printed.
The command Freq * produces frequencies for all the variables in your questionnaire, a convenient way to begin the analysis of a new data set. Freq with a series of variable names following, separated by spaces, does separate frequencies for each of the variables listed.
Placing /C after the Freq command adds confidence limits to the output. Thus Freq Sex /C produces the frequency with confidence limits for each value.
Creating Cross Tabulations (Tables command)
The Tables command counts the records in which the values meet criteria for two fields at the same time. The Tables command is for data items that are arranged in categories and counted. Entering Tables Sex Ill results in:
and a whole lot more:
Note that the interpretation of relative risk (risk ratio) depends on the orientation of the table, and that not all relative risks are meaningful. An interpretation is printed with the relative risk value. If risk factors are on the left side of the table and disease across the top, with presence indicated first in each case, the relative risk represents the risk of disease for persons with the first factor relative to those with the second factor. As indicated in the note, relative risk should be ignored in a case-control study, since it cannot be interpreted meaningfully.
Series of stratified tables can be produced by listing more than two variables in the TABLES command. The variables after the first two serve as the basis for dividing the tables into levels or strata, one for each combination of variables after the second. To try it out use Tables Vanilla Ill Sex (tables of Vanilla consumption by Illness, stratified by Sex):
This results in separate tables for male and female cases. A Mantel-Haenszel stratified analysis follows the two tables. The very large odds ratio indicates a strong association between eating vanilla ice cream and the occurrence of illness. The Mantel-Haenszel weighted odds ratio, summarizing the results of stratification by sex, is even larger. Both the confidence limits and the extremely small p value indicate that the Mantel-Haenszel weighted odds ratio is significantly different from 1.0, the value indicating no association.
Both the crude and stratified relative risks are also significantly greater than 1.0. In this outbreak of staphylococcal food poisoning in Oswego, New York, the vanilla ice cream was indeed identified as the cause, through both statistical and microbiologic evidence.
An asterisk or star(*) may be used in the Tables command to indicate all variables. Thus, Tables Ill * would produce tables of illness status against each variable in the questionnaire. This is a quick way to obtain preliminary analysis of a questionnaire for which comparison of two groups is the key feature.
Rothman's test, when available, indicates differences in the odds ratios between strata, if the p value is small. Such differences suggest that the Mantel-Haenszel weighted odds ratio, while still valid as a summary of all strata, does not necessarily reflect the value for one of the strata.
To display percentages in tables along with the numbers, you can enter the command Set percents = on.
Calculating Means
There is another whole world of statistics for numbers that are continuous, such as height, weight, and age. The Means command produces a table that displays continuous or ordinal data and then performs appropriate statistical analysis. The Means command requires two items of information--the variable containing data to be analyzed and the variable that indicates how groups will be distinguished. The command is:
Means [Numeric variable to be analyzed] [Variable for Grouping]
If you prefer not to display the table of values, append "/N" to the command to indicate "No tables."
Continuing to using the OSWEGO.REC file, enter Means Age Ill to compare ages for people that were ill or not:
Plus you get a few more screens showing more ages and then:
At first this may be more numbers than you ever wanted, but the overall results can be quickly understood. The means of the two sets of ages are 39 and 33. Now look at the p values under the ANOVA and Kruskal-Wallis tests. Since both are more than 0.05, we can conclude that the difference in mean age between the Ill and Not Ill groups is not "significant." Bartlett's test helps you decide which one of the two methods to choose, but in this case, both tests lead to the same conclusion.
Missing Values
Missing values in Epi Info are entered as blanks in the actual records. During data entry, pressing Enter in a field rather than entering data will result in a missing value. In the Means, Tables, and Freq procedures, missing values will be ignored if SET IGNORE is ON (which is the default).
If, however, you have used another code, such as 99, for missing values, be sure to select only the non-missing values before using the means procedure. This can be done by using SELECT AGE <> 99, for example (<> means "not equal to"). Be particularly aware of this point if the data have been imported from another system in which missing values may be coded differently.
Displaying Charts
Analysis produces histograms, scatter plots, pie charts, and bar and line graphs directly from data files. The commands to create a chart look like:
- Bar Timesupper
- Line Age
- Pie Sex
- Scatter Sysbp Bodymass
- Histogram Onsetdate
Here are a few sample charts:
If /R is placed after the SCATTER command, a least squares regression line will be drawn through the data points, as in:
Analyzing Data, Part 1 | Getting Started With Epi Info | Exporting Data to Microsoft Access