**Chapter 7: Data Interpretation, Analyses, and Interpretation**

**Steps of Data Analysis Process:**

- prepare data for analysis
- analyze the data
- interpreting the data (i.e., testing the research hypotheses and drawing valid inferences)

**Data Preparation**

### Logging and Tracking Data

*Justification for use:* Without a well-established procedure, data can easily become disorganized, uninterpretable, and ultimately unusable.

There is no definitive method for logging and tracking data collection and entry, however many computer programs exists that can assist in the process (e.g., SPSS, Microsoft Excel, SAS, etc).

**Recruitment Log:** a comprehensive record of all individuals approached about participation in a study

- record dates and times potential participants were approached
- whether they met eligibility criteria
- whether they agreed and provided informed consent to participate in the study
- Important: no identifying information should be recorded for individuals who do not consent to participate!

*Primary purpose:* to keep track of participant enrollment and to determine how representative the resulting cohort of study participants is of the population that the researcher was attempting to examine

- can also provide the researcher with up-to-date information on the general status of the study, including client participation, data collection, and data entry

### Data Screening

- Takes place immediately following data collection but prior to data entry
- is an essential process to ensure that data are accurate and complete
- Use computerized assessment instruments to simplify this process and make it more time efficient

Resarcher should screen data to make certain that:

- responses are legible and understandable
- responses are within an acceptable range
- responses are complete
- all of the necessary information has been included

### Constructing a Database

Once data are screened and all corrections are made, the data should be entered into a well-structured database.

- The statistical analysis may dictate what type of program should be chosen for the database (i.e. certain advanced statistical analyses may require the use of specific statistical programs)

### The Data Codebook

- a written or computerized list that provides a clear and comprehensive description of the variables that will be included in the database.

At the bare minimum, a data codebook should include the following for *each* *variable*:

- variable name
- variable description
- variable format (number, data, text)
- instrument or method of collection
- date collected
- respondent or group
- variable location (in database)
- notes

### Data Entry

Double-Entry Procedure: a way of ensuring accuracy of data entry in which data are entered into the database twice and then compared to determine whether there are any discrepancies

- Pros: very effective way to identify entry errors
- Cons: may be difficult to manage and may not be time or cost effective

Alternatives to double-entry procedure:

- run descriptive analyses and frequencies on each variable
- use a data base program that allows researcher to define the ranges, formats, and types of data that will be accepted into certain data fields, making it impossible to enter data that does not meet preset criteria (e.g., SPSS, Microsoft Excel)

### Transforming Data

Typically involves the following:

- identifying and coding missing values
- computing totals and new variables
- reversing scale items
- recoding and categorization

#### Identifying and Coding Missing Values

*Purpose:* to allow the researcher to designate specific values to represent missing data

Missing Value Imputation:

**Hot deck imputation**- researcher matches participants on certain variables to identify potential donors. Missing values are then replaced with values taken from the matching respondents (i.e., respondents who are matched on a set of relevant factors)
**Predicted mean imputation**- imputed values are predicted using certain statistical procedures (i.e., linear regression for continuous data and discriminant function for dichotomous or categorical data)
**Last value carried forward**- Imputed values are based on previously observed values. This method can be used
*only for longitudinal variables*, for which participants have values from previous data collection points. **Group means**- Imputed variables are determined by calculating the variable's group mean (or mode, in the case of categorical data)

#### Computing Totals and New Variables

*Purpose:* Researcher may want to create new variables based on values from other variables.

- can serve to normalize the distribution and improve accuracy of outcomes (if the variable is not normally distributed)

#### Reversing Scale Items

*Purpose:* to decrease the likelihood of participants' falling into a "response set"

**Response Set:** occurs when a participant begins to respond in a patterned manner to items on an assessment measure, regardless of the content.

#### Recording Variables

*Purpose:* record variables into categories for ease of analysis

- simplifies data analysis and interpretation

#### Data Transformations

*Purpose:* to use one of several objective methods to determine whether variables are normally distributed in order to avoid making a Type I or Type II error by analyzing variables that are *not* normally distributed

- Typically involves measuring each variable's
**skewness**and**kurtosis***Skewness*: measures the overall lack of symmetry of the distribution, and whether it looks the same to the left and right of the center point*Kurtosis*: measures whether the data are peaked or flat relative to a normal distribution

*Most frequently used transformations:*

**Square root transformation:** involves taking the square root of each value within a certain variable

*Caveat:* cannot take the square root of a negative number

*Solution:* add a constant, such and 1, to each item before computing the square root

**Log transformation:** a logarithm is the power (aka exponent) to which a base number has to be raised to get the original number

*Caveat:* as with square root transformation, if a variable contains less than 1, a constant must be added to move the minimum value of the distribution

**Inverse transformation:** involves taking the inverse of each value by dividing it into 1

*Example:* the inverse of 3 would be computed as 1/3

*Caveat:* researchers should be careful not to misinterpret the scores following their analysis because this transformation procedure essentially makes very small values very large, and very large values very small

**Data Analysis**

A variety of statistical procedures exist that allow researchers to:

- describe groups of individuals and events
- examine the relationships between variables
- measure differences between groups and conditions
- examine and generalize results obtained from a sample back to the population from which the sample was drawn

*Purpose:* to help researcher interpret data for the purpose of providing meaningful insights about the problem being examined

*2 categories major of statistical procedures:*

**Descriptive statistics**

- allow researcher to describe the data and examine relationships between variables

**Inferential statistics**

- allow researcher to examine causal relationships
- allow researcher to go beyond the parameters of the study sample and draw conclusions about the population from which the sample was drawn

**T-Test**

*Used to test mean differences between groups*

*Can be used when a researcher wishes to compare the average (mean) performance between two groups on a continuous variable*

**Analysis of Variance (ANOVA)**

*Also a test of mean comparisons*

*Unlike a t-test, an ANOVA can compare means across more than two groups or conditions*

*//Despite what its name may suggest, the ANOVA works by comparing differences between group means rather than the differences between group variances*//

**Chi-Square**

*Allows us to test hypotheses using nominal or ordinal data*

*It summarizes the discrepancy between observed and expected frequencies*

*Because it compares categorical responses between two or more groups, the chi square statistic can be conducted only on actual numbers (rather than on precalculated percentages or proportions)*

**Regression**

*Linear regression: Method of estimating or predicting a value on some dependent variable given the values of one or more IV's*

*Simple regression: Predicts the dependent variable with a single independent variable*

*Multiple regression: Uses any number of independent variables to predict the dependent variable*

*Logistic regression: Unique in its ability to predict contrasting variables (such as the presence or absence of a specific outcome), based on a specific set of independent or predictor variables*

**Test Yourself**

1. A written or computerized record that provides a clear and comprehensive description of all variables entered into a database is known as a ___.

2. ___ statistics are generally used to accurately characterize the data collected from a study sample.

3. A graph that illustrates the frequency of observations by groups is known as a ____.

4. A measure of the spread of values around the mean of a distribution is known as the ___ ___.

5. Analysis of variance (ANOVA) is used to measure differences in group ___.

Answers: 1. data codebook; 2. Descriptive; 3. histogram; 4. standard deviation; 5. means.