Statistical Data Analysis to be done in MS Excel.
Statistical Data Analysis to be done in MS Excel.
CP2403 / CP3413 Assignment 1 1 Note: This is an individual assignment. While it is expected that students will discuss their ideas with one another, students need to be aware of their responsibilities in ensuring that they do not deliberately or inadvertently plagiarize the work of others. Assignment 1 – Practice on Exploring/Analysing Data Due date: Friday, 29th April 2016 5pm Assessment Weight: 15% Rationale This assignment has been designed as the first part of CP2403/CP3403 Assignments to assess students’ ability to explore/model/analyse data, by using analytic tools (Excel, StatTools etc.). This assignment addresses the following learning objectives for this subject: • appraise data and information analytics concepts and procedures • analyse data and interpret results to generate information for decision making Submission • You need to submit a zipped folder which contains one Word (or PDF) document file and relevant Excel files to LearnJCU. The Word (or PDF) document should include all the answers and information about which Excel file contains related results of each task. Please name the zipped file as LastnameFirstnameA1.zip (or other zipped file format) • Timestamp shown on LearnJCU assignment submission will be used to determine if the assignment is late or not. Refer to the subject guide for the policy for late submission. Requirements (Tasks) This assignment will give you practice in using the techniques you are learning to analyze data, and to interpret the analyses (in particular techniques you learned in CP2403/C3413 through Week 1~5). This assignment consists of 7 separate analysis tasks and each analysis task requires you to apply various data analysis/interpretation techniques to reveal implicit information from the given data and answer for the given specific questions. CP2403 / CP3413 Assignment 1 2 Task 1 – [10 Marks] The file Task1-Data.xlsx contains (fictional) data from a survey of 500 randomly selected households. a. Indicate the type of data for each of the variables included in the survey. b. For each of the categorical variables in the survey, indicate whether the variable is nominal or ordinal and why. c. Create a histogram for each of the numerical variables in this data set. Indicate whether each of these distributions is approximately symmetric or skewed. Which, if any, of these distributions are skewed to the right? Which, if any, are skewed to the left? d. Find the maximum and minimum debt levels for the households in this sample. e. Find the indebtedness levels at each of the 25th, 50th, and 75th percentiles. f. Find and interpret the interquartile range for the indebtedness levels of these households. Task 2 – [15 Marks] The file Task2-Catalog Marketing.xlsx contains recent data on 1000 customers of a company which is a direct marketer of technical products. a. Identify all customers in the data set who are 55 years of age or younger, female, single, and who have had at least some dealings with this company before this year. Find the average number of catalogs sent to these customers and the average amount spent by these customers. b. Do any of the customers who satisfy the conditions stated in part a (previous subquestion) have salaries that fall in the bottom 10% of all 1000 combined salaries in the data set? If so, how many? c. Identify all customers in the sample who are more than 30 years of age, male, homeowners, married. And who have had little if any dealings with this company before this year. Find the average combined household salary and the average amount spent by these customers this year. d. Do any of the customers who satisfy the conditions stated in part c (previous subquestion) have salaries that fall in the top 10% of all 1000 combined salaries in the data set? If so, how many? e. Identify all customers who are either (1) home owners between the ages of 31 and 55 who live reasonably close to a shopping area that sells similar merchandise, and have a combined salary between $40,000 and $90,000 (inclusive) and a history of being a medium or high spender at this company; or (2) homeowners greater than the age of 55 who live reasonably close to a shopping area that sells similar merchandise and have a combined salary between $40,000 and $90,000 (inclusive) and a history of being a medium or high spender at this company. f. Characterize the subset of customers who satisfy the conditions specified in part e (previous sub-question). In particular, what proportion of these customers are women? What proportion of these customers are married? On average, how many children do CP2403 / CP3413 Assignment 1 3 these customers have? Finally, how many catalogs do these customers typically receive, and how much do they typically spend each year at this company? g. In what ways are the customers who satisfy condition 1 in part e (previous subquestion) different from those who satisfy condition 2 in part e (previous subquestion)? Be specific. Task 3 – [15 Marks] The file Task3-LiquorSales.xlsx contains monthly sales (in millions of dollars) of beer, wine, and liquor. The data have not been seasonally adjusted, so there might be seasonal patterns that can be discovered. For any month in any year, define that month’s seasonal index as the ratio of its sales value to the average sales value over all months of that year. a. Calculate these seasonal indexes, one for each month in the series. Do you see a consistent pattern from year to year? If so, what is it? b. To “deseasonalize” the data and get the seasonally adjusted series often reported, divide each monthly sales value by the corresponding seasonal index from part a (previous sub-question). Then create a time series graph of both series, the actual sales and the seasonally adjusted sales. Explain how they are different and why the seasonally adjusted series might be of interest. Task 4 – [15 Marks] The file Task4-Houses.xlsx contains data on 148 houses that were recently sold in a (fictional) suburban community. The data set includes the selling price of each house, along with its appraised value, square meters, number of bedrooms, and number of bathrooms. a. Create two new variables, Ratio 1 and Ratio 2. Ratio 1 is the ratio of Appraised Value to Selling Price, and Ratio 2 is the ratio of Selling Price to Square meter. Identify any obvious outliers in these two Ratio variables. b. Find the mean, median, and standard deviation of each Ratio variable, broken down by Bedrooms. Also, create side-by-side box plots of each Ratio variable, again broken down by Bedrooms. Comment on the results. c. Repeat part b (previous sub-question) with Bedrooms replaced by Bathrooms. d. If you repeat part b and c (previous two sub-questions) with any obvious outlier(s) form part a removed, do the conclusions change in any substantial way? Task 5 – [10 Marks] The file Task5-Top50Courses.xlsx includes data on the 50 top graduate programs in the U.S. a. Create a table of correlations between all of the numerical variables. Discuss which variables are highly correlated with which others. b. The overall score is the score schools agonize about. Create a scatterplot and corresponding correlation of each of the other variables versus Overall, with Overall always on the Y axis. What do you learn from these scatterplots? CP2403 / CP3413 Assignment 1 4 Task 6 – [15 Marks] Bank98 operates a main location and three branch locations in a medium-size city. All four locations perform similar services, and customers typically do business at the location nearest them. The bank has recently had more congestion – longer waiting lines – than it (or its customers) would like. As part of a study to learn the causes of these long lines and to suggest possible solutions, all locations have kept track of customer arrivals during one-hour intervals for the past 10 weeks. All branches are open Monday through Friday from 9 A.M. until 5 P.M. and on Saturday from 9 A.M. until noon. For each location, the file Task6-Bank.xlsx contains the number of customer arrivals during each hour of a 10-week period. Imagine that the manager of Bank98 has hired you to make some sense of these data. Specifically, your task is to present charts and/or tables that indicate how customer traffic into the bank locations varies by day of week and hour of day. There is also interest in whether any daily or hourly patterns you observe are stable across weeks. Although you don’t have full information about the way the bank currently runs its operations – you know only its customer arrival pattern and the fact that it is currently experiencing long lines – you are encouraged to append any suggestions for improving operations, based on your analysis of the data. Task 7 – [20 Marks] The best-selling book “The Millionaire Next Door” by Thomas J. Stanley and William D. Danko (Longstreet Press, 1996) presents some very interesting data on the characteristics of millionaires. We tend to believe that people with expensive houses, expensive cars, expensive clothes, country club memberships, and other outward indications of wealth are the millionaires. The authors define wealth, however, in terms of savings and investments, not consumer items. In this sense, they argue that people with a lot of expensive things and even large incomes often have surprisingly little wealth. These people tend to spend much of what they make on consumer items, often trying to keep up with, or impress, their peers. In contrast, the real millionaires, in terms of savings and investments, frequently come from “unglamorous” Professions (particularly teaching), own unpretentious homes and cars, dress in inexpensive clothes, and otherwise lead rather ordinary lives. Consider the (fictional) data in the file Task7-Wealth.xlsx. For several hundred couples, it lists their education level, their annual combined salary, the market value of their home and cars, the amount of savings they have accumulated (in savings accounts, stocks, retirement accounts, and so on), and a self-reported “social climber index” on a scale of 1 to 10 (with 1 being very unconcerned about social status and material items and 10 being very concerned about these). Prepare a report based on these data, supported by relevant charts and/or tables that could be used in a book such as “The Millionaire Next Door”. Your conclusions can either support or contradict those of Stanley and Danko. CP2403 / CP3413 Assignment 1 5 Assignment 1 – Practice on Exploring/Analysing Data: Marking Criteria Exemplary Competent Marginal Unacceptable For Each Task 10 4-9 1-3 0 15 6-14 1-5 0 20 8-19 1-7 0 Use appropriate analysis tools/functions to generate correct relevant analysis results All questions are correctly answered through the analysis result The solution is accurate, logical and desirable. The discussion (if required) of solution is appropriate to elicit correct conclusions (answers). Use appropriate analysis tools/functions to generate relevant analysis results but not completely desirable. Most questions are correctly answered through the analysis result but some answers are wrongly leaded by some erroneous analysis The solution is mostly accurate, but not fully desirable. The discussion (if required) of solution is elicit but not fully correct. Attempted to use analysis tools/functions but mostly incorrect and poorly applied Solutions are provided but mostly not correct. No discussions (if required) are made enough to support solutions. Not attempted
Is this the question you were looking for? If so, place your order here to get started!