QA Exercise

QA Exercise

ranscript: Research Methods Module 4 Slide 1: Research Methods Module 4 Welcome. This is the PowerPoint for Module 4. Here we will review material on Summarizing and analyzing quantitative data Slide 2: Why Statistics? It is very important that graduate students understand quantitative methods in criminal justice and criminology. We live at a time with an overwhelming amount of information available to us, and one needs to learn which data are accurate and which are not. Further, too many individuals and groups use data in a misleading way in an attempt to use the guise of sciences to persuade. In order to become an educated consumer (and creator) of data, one needs to understand the use (and mis-use) of statistics. Statistics are a tool; the ASA (American Statistical Association) defines statistics as, “the science of learning from data.” According to Merriam-Webster (http://www.merriam-webster.com/dictionary/statistics/) , statistics is a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data. This section provides a general introduction to quantitative approaches for analyzing data. Slide 3: Quantitative Approaches The goal of all empirical research is to summarize the data we collect into meaningful and useful form. This falls into two general categories. First, we want to describe variables, that is, understand the ‘typical’ case and how cases (values) differ from each other. Second, researchers wish to describe relationships between variables and test hypotheses about them. Note that these PowerPoints provide a broad overview of quantitative data analysis; the reading and the videos are essential to gaining understanding of the topic. Slide 4: Univariate Data Analysis A variable’s level of measurement (nominal, ordinal, interval, ratio) is the most important determinant of the appropriateness of particular statistics. In other words, the way the variable is measured affects what statistical techniques can be used. For example, we can calculate average age of a sample, but not average gender. How do we describe one variable (by itself)? Frequency distributions and graphs (e.g., a bar chart of a frequency distribution) are the two most popular approaches for displaying variation. This is displayed in the table below. If percentages are presented rather than frequencies (sometimes both are included), the total number of cases in the distribution (the Base N) should be indicated. Typically, if there are 20 or more cases, percentages should be provided. Slide 5: Summary (Descriptive) Statistics Statistics are used to describe the distribution of and relationship among variables, for e.g., what do the values of the variables look like? In the univariate case (one variable by itself), we want to figure out the typical case—using measures of central tendency. For e.g., what is the average age of arrest for homicide? Second, we need to know how dispersed (or different) the values of the variable are-using measures of dispersion. For e.g., for those arrested for DUI, what does time of day (hour) look like? Do arrests cluster around the same hour, or do they occur at all hours of the day and night? 2 Slide 6: Measures of Central tendency There are three measures of central tendency, the mean, the median, and the mode. The Mean is the arithmetic average, and it is used for interval and ratio level variables. The Median is the mid –point of values in a distribution; thus, it is the 50th percentile of a rank-ordered distribution. The median can be used for interval or ratio variables. It is particularly useful for variables with extreme high or low scores/outliers, that is, a skewed distribution; this is because it is less subject to distortion than the mean in these cases, giving us a more accurate picture of the typical case. Last is the Mode, which is the most frequent/commonly occurring value for a variable. Note that there can be more than one mode for a variable. It is appropriate for nominal level variables (for example, gender). Slide 7: Measures of Dispersion (Variation) These measures capture how widely or densely spread the values are for the variable of interest. First is the Range, which is the simplest measure of variation. It reveals the entire span of values (that is, the (highest value – lowest value) + 1 ). The range is appropriate for ordinal, interval, and ratio levels of measurement (but not nominal, since the categories have no numerical meaning). A second measure is the Variance (note that the Standard deviation is its square root, and is the preferred measure of variability). The variance is the average square deviation of each case from the mean. A smaller value implies that the values of the variable are more tightly clustered around the mean. Published papers will typically report the mean and standard deviation for each interval or ratio level (continuous) variable in the study. Slide 8: Levels of measurement and descriptive statistics This table summarizes the various measures of central tendency and dispersion and when they are appropriate to calculate given the level of measurement of the variable. Slide 9: Bivariate analysis As we move from the simpler to the more complex, let us discuss how researchers describe the connection between two variables. Multiple techniques exist to explore the relationship between two variables (e.g., gender and delinquency, number of convictions and employability, treatment and recidivism). We will focus on two (the most common): Correlation, and Cross-tabulation (also known as contingency table analysis). Slide 10: Correlation Correlation is a statistic that measures the strength of association between two continuous (interval or ratio) variables. The most common is Pearson’s. Pearson’s correlation (sometimes referred to as “Pearson’s r”) reflects the degree of linear relationship between two variables. It ranges in value from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables, while an ‘r’ of minus 1 means that two variables have a perfect inverse, or negative, linear relationship. The scatterplots shown on the next page depict a variety of such relationships. Slide 11: Correlation (continued) Below are examples—scattergrams–displaying the plot of the values of two variables (x and y) for each case/subject. The closer the ‘r’ is to zero, the weaker the (linear) relationship between the two variables. 3 Slide 12: Correlation (continued) Remember, however, that correlation does not mean causation (i.e., that X causes Y). For example, there is a strong correlation between number of police cars responding to a call and the severity of the crime, but the police cars did not cause the crime. Another Reminder– For X to cause Y, X has to happen first, and we must rule out all rival causal factors. Analytically, we can control for the influence of a third variable and re-assess the correlation between X and Y (to see if it changes). Note that the correlation technique only measures linear (not curvilinear) associations between variables– that is, variables may be related but may not appear to be so, that is, have an r close to zero, because they are related in a curvilinear way. This is one reason we need to be sure to ‘eyeball’ the data—to look at the values in a scattergram, and not just focus on the ‘r’ statistic. One e.g. of a curvilinear relationship would be age and health care). Slide 13: Cross-tabulation A cross tab, or contingency table, represents the joint frequency distribution of two variables. That is, it displays the distribution of one variable for each category of another variable. A cross-tab is also a tool for statistically controlling one or more variables while examining the associations among others. It is usually used when both variables are measured at either the nominal or the ordinal level, but also can be used for interval or ratio measures that have been reduced into a smaller number of categories (or groupings). Slide 14: Cross-tabulation (continued) Consider the following cross-tab example, where gender is considered the independent variable, and delinquency is the dependent variable. Gender is displayed in the columns and self-reported delinquency is shown in the rows of the table. The table should include (and we should focus on interpreting) the percents. The rule is to percentage in the direction of the independent variable. Thus, in this case, the researcher calculated column percents, and we want to compare the %s across. That is, if we compare females to males, are the percentages similar to each other, or different? If they are different, that tells us that gender is related to delinquency. For example, a greater proportion of males than females report high levels of delinquency (42% vs. 33%). Slide 15: The X2 (Chi Squared) test Chi-square is the inferential statistic used in most cross-tabular analyses (to test whether the variables are dependent on each other, and that the association exists in the larger population from which the sample was drawn). An important concept to mention is that of statistical significance. When we test for statistical significance, we are trying to rule out that the association between two variables is not likely to be due to chance (or random error), according to some criterion (probability level) set by the analyst. Convention dictates that the criterion be a probability less than 5% (p <.05). That is, we can be confident when we see a p value of this size that the relationship we observe between variables (in a table, or a correlation) is a true one. Note, however, that in a large sample, an association may be statistically significant, but still be too weak to be substantively significant or important (because sampling error decreases as sample size increases). 4 Slide 16: Controlling for the effects of a third variable Relationships between social science variables can be complex. In reality, then, after examining bivariate relationships, researchers will conduct elaboration analyses and also use multivariate techniques (the explanation of which is beyond the scope of this course). Multivariate techniques like regression allow the researcher to study how, for example, the victimization rate is affected by geography, population size, gender, race, and age, simultaneously. We want to develop our explanations/models to best reflect reality/the sophisticated social world. The process of introducing control variables into a bivariate relationship is called elaboration analysis. Slide 17: How and Why to use the Elaboration model? The elaboration approach involves examining how the relationship between X and Y changes when a third factor is introduced. This can be done with contingency tables and with correlations. There are three different uses for three-variable cross-tabulation. First is to identify an intervening variable; second is to test a relationship for spuriousness, and third is to specify the conditions under which a relationship exists. Each of these uses of the elaboration model helps determine the validity of our findings, either by evaluating the criteria for causality or by indicating the cross-population generalizability of the findings. Slide 18: Two examples The two figures below show hypothetical relationships between three variables. An intervening variable is one that occurs after X but before Y. An antecedent variable comes before both X and Y, and can be introduced to test whether or not the relationship between X and Y is spurious. The process is as follows: First examine the bivariate relationship, and then examine if/how the relationship between X and Y changes after the third variable is added to the analysis. If the relationship between X and Y is spurious, it will disappear (or be substantially weaker after the third variable is introduced). Question: How do you know if the change caused by the third variable is because it is extraneous or intervening? Your answer is based on logic and theoretical grounds (not on the outcome of the elaboration). Slide 19: Concluding thoughts Reporting statistical results involves finding balance between manageability and detail, conveying the right amount of information for the reader to understand what was done and what was found in the analysis, without overwhelming with data. Slide 20: Additional resources Fortunately, we live in a time where there are a number of helpful (free, public domain) statistics books and web sites. These include, Online Statistics Education: A Multimedia Course of Study (http://onlinestatbook.com/). Project Leader: David M. Lane, Rice University. A(nother) free and open access stat book (Probability and Statistics Ebook), is available from the UCLA Statistics Online Computational Resource (SOCR): http://wiki.stat.ucla.edu/socr/index.php/EBook . And more generally, there are free resources in a variety of academic subject areas (including statistics) found at http://www.hippocampus.org.

Videos

Is this the question you were looking for? If so, place your order here to get started!