Introduction
In the Arts Corporation Case, a lawsuit has been filed by all female employees of Artsy Corporation, alleging discrimination based on gender in the workplace. The lawsuit claims that gender has influenced various aspects such as pay, hiring, promotions, and other career-related factors. In this paper, we will analyze statistical information and data from Artsy's payroll of 256 employees at one of their facilities. Our analysis aims to provide Artsy's lawyers with a comprehensive understanding of the company's situation so that they can construct an effective defense.
The data was selected using simple random sampling to represent the population, which includes every employee’s pay rate and working conditions.
The data includes:
-
an ID number (IDNUMBER) that can
...
identify the person by name or social security number,
- the person's sex (SEX), with 0 denoting female and 1 denoting male,
- the person's job grade in 1986 (GRADE),
- the hierarchy level at the company and the length of time (in years) the person had been in that job grade as of 12/31/86 (TING),
- the weekly pay rate as of 12/31/86 (RATE), which is the most important point of concern.
To analyze this data statistically, we will create multiple regression models.
The purpose of this analysis is to investigate the correlation between Pay Rate (dependent variable) and Gender, Job Grade, and Time in Grade (independent variables). Our objective is to determine whether gender influences salaries within the company. If gender is found not to b
View entire sampleJoin StudyHippo to see entire essay
a significant factor, we will explore other independent variables that may impact salaries. This section focuses on Part 1: Descriptive Statistics.
Definition of Important Terms: The paper utilizes descriptive statistics to gather, summarize, and analyze data in order to draw conclusions. Descriptive statistics are essential for analyzing large datasets like the one being considered. Each specific dataset will be examined with regards to both its central tendency and variation regarding the variable under consideration.
The purpose of this paper is to investigate the concept of "mean", which represents the arithmetic average and serves as a measure of central tendency for our data. To illustrate the concept of mean and arithmetic average, we provide an example:
Using our data and available software, we have determined that the mean pay rate for females in Artsy Corporation is $833 per week, while males in Artsy Corporation have a mean pay rate of $1128 per week. This information suggests that males receive a higher salary (Please note that this example oversimplifies matters and does not consider other variables). Additionally, it is important to clarify another crucial term for better understanding: standard deviation.
The standard deviation quantifies the extent to which data deviates from its average. It reveals the range of variability in the dataset, indicating proximity or distance between values. Considering our data table, we will concentrate on pay rate as it is highly significant. The lowest pay rate is $579 per week and the highest is $1552 per week, yielding an average pay rate of $931 per week.
Our first standard deviation is $229 per week, indicating that 68% of our data (1st standard deviation) falls within $229 above or below our
mean of $931. Below is a box-whisker plot illustrating the pay rate distribution at this corporation: Figure 1. The plot reveals several insights. Firstly, let's examine the quartiles:
- 25% of the employees have a weekly pay rate below $762.
- 50% of the employees have a weekly pay rate ranging from $762 to $1073.
- 25% of the employees have a weekly pay rate above $1073.
- Our median is $865, indicating that 50% of our employees earn more and 50% earn less than this amount.
- The little red box on the right represents an extreme outlier, meaning it stands out significantly compared to other values.
We believe this outlier corresponds to the pay rate of an executive or manager at the branch, which explains its much higher value than that of regular employees. Therefore, it shouldn't be used for comparison purposes. From the graph, we can observe that the lower 25% of salaries are closely clustered at one end, suggesting they fall within a similar range. Conversely, the salaries in the highest 25% exhibit greater variability, as indicated by the longer line at that end.
The lines at the end of the box show how information is distributed visually. In this case, higher salaries are spread out more. We analyzed pay rates in relation to gender and found significant differences in average salaries between men and women, as mentioned earlier. However, we didn't account for other variables in the dataset yet. To better understand the disparity, we made a side-by-side box and whisker plot that shows the pay rates of male and female employees.
The graphs illustrate a substantial disparity in salaries between males and females. The earnings of the bottom 25% of male employees
are approximately equivalent to the middle 50% range for women (25%-75%). Given that this sample represents our company's overall population, it can be deemed as a representation of our company's salary distribution in general. Moreover, there are red squares positioned on the right side of the graphs. Precisely, there are four red boxes (two overlapping to create a darker shade). These red boxes signify that higher-earning females within the company deviate statistically from typical patterns at Artsy and therefore qualify as statistical outliers compared to the remaining workforce.
Although this is only an initial data observation without considering other variables, it cannot be concluded. It is noticeable that men tend to have higher salaries, indicating a possible impact of gender on pay rates.
Next, we can examine the effect of employee grade on salary. The grade represents the hierarchical position within the company, ranging from 1 (lowest) to 8 (highest). The following table displays each employee's grade.
When examining the graph, it is evident that there is a comparable distribution of employees in both higher and lower positions within management. This distribution implies that the number of employees remains consistent regardless of grade level. Figure 4 exhibits a scatter plot showcasing how employees' pay rates correspond to their grade levels. Each circle on the plot represents an employee, displaying their level and pay rate. Furthermore, we have incorporated a gender code to emphasize disparities between males and females.
The graph displays the distribution of male and female employees based on grade level and salary. Male employees are represented as a red box (1), while female employees are represented as a blue circle (0). The graph clearly illustrates a clear
trend where salaries increase with higher grade levels. From analyzing the graph, it can be inferred that, on average, men generally occupy higher grade levels with higher salaries compared to women who typically hold lower grade levels with lower pay rates.
Gender discrimination is evident within Artsy Corporation, with men consistently receiving higher pay rates compared to women at the same level. Women are predominantly found in lower grade levels, as evidenced by the fact that all employees in grade level 2 are female while only 34% of employees in grade level 7 are female. To substantiate this claim, we will continue gathering data for statistical evidence.
The data analysis of time within grade reveals the duration each employee has spent in their grade level, measured in years. On the previous page, we observed a graph that displays the relationship between pay rate and time within grade, differentiated by gender to identify disparities. Each dot on the graph represents an employee, with male employees represented by a red square (1) and female employees represented by a blue circle (0). The graph clearly indicates that women generally have shorter tenures in their grade compared to men. Furthermore, it is evident that women and men who have spent equal time in grade tend to earn lower pay rates for women as opposed to men.
For instance, a male with 0.5 years of grade experience earns $1413 per week, whereas a female in the same grade and experience level earns only $605 per week. Nevertheless, this notable disparity is probably due to differences in grades. In the upcoming sections of this report, we will investigate if there is a substantial correlation
between duration spent in grade and salary rate. Descriptive statistics suggest that certain factors indicate wage discrimination based on gender.
The current evidence is insufficient to prove job discrimination. Therefore, our paper will proceed to the next stage by analyzing our data through regression. In Part 2: Regression Analysis, we will further explore how independent variables such as gender, grade, and time in grade influence pay rate. By utilizing regression analysis, we can determine if there is any gender discrimination affecting the pay rate.
To begin the regression section, we must select a significance level, representing our percentage of error. In this paper, we have opted for a 1% significance level, meaning that our model will have minimal error. The reason behind choosing this level is to ensure that we provide our lawyers with the most accurate information possible to defend Artsy Corporation against employee lawsuits. By doing so, we will provide them with a strong defense. In regression, it is crucial to choose a hypothesis, which will be the focus of our testing.
Hypothesis testing involves establishing two opposing statements known as the null hypothesis and the alternative hypothesis. These statements are mutually exclusive and only one can be true. Here, we present both the null and alternative hypotheses:
Ho (null hypothesis): There exists a linear relationship between the Pay Rate of our employees and the independent variables we have defined (gender, grade, time in grade). We hold this belief until proven otherwise.
?H1 (alternative hypothesis): There is no linear relationship between our employees Pay Rate and the independent variables which we have defined (gender, grade, time in grade). We reject the Ho (null)
if the p-value of any of our variables shows up larger than our significance level (1%). The P-value is the probability that the sample data would occur if a pre-defined null hypothesis (H0) were in fact true in the population. We use each of the individual p-values to compare to our significance level (1% which we chose above).
If the p-value is less than 1%, we reject the null hypothesis (H0); otherwise, we do not reject it. If the p-value exceeds the significance level, we fail to reject the null hypothesis since our data sample's occurrence is not statistically significant. Statistically significant means that the data may have happened by chance alone and not influenced by other independent variables.
An equation predicting pay rate was derived using software. The equation includes gender, grade, and time in grade as independent variables. It is displayed below: Pay Rate = 527 + 59.
6 Gender Coded + 30.8 Time In Grade + 75.0 Grade R2= 82.3% S.E (Standard Error of the Estimate) =$97.0601 per week Now it is crucial to comprehend the above equation and its structured format.
The pay rate (527) is the constant that serves as the baseline in our equation. When all other independent variables are 0, the constant (527) represents the pay rate. The gender variable is coded to allow for regression analysis, with 0 representing females and 1 representing males. The second independent variable is time in grade (years), which is already in quantitative format and requires no further adjustment.
To assess the qualitative aspect of the grade level variable in this equation, we will create eight new variables (e.g., grade 1) for each grade level. If an
employee belongs to a specific grade level, the corresponding variable will be set as 1; otherwise, it will be assigned a value of 0. By comprehending each variable fully, we can now clarify how the equation functions. The payment amount of 527 is specifically applicable to an employee who has values of 0 for all other independent variables (i.e., female, 0 years in grade, and grade level 0).
The first number, grade level 0, cannot be interpreted as it does not exist. The next number, 59.6 from the gender variable, indicates that on average, male employees (1) earn $59.6 more per week compared to their female counterparts, assuming all other factors remain the same.
The information provided suggests gender discrimination, as it reveals that male employees earn $59.6 more per week than their female counterparts, all other factors remaining equal. Another important figure to consider is 30.8, which represents the impact of Time in Grade on pay. This means that, on average, for each additional year an employee spends at their current grade, their weekly pay rate should increase by $30.8, assuming all other variables remain consistent.
This demonstrates the typical progression in companies, where employees who remain in their current grade level for a longer period of time experience an increase in their pay rate correlated with their level of experience. The Grade level variable is the final number to be examined, with a value of 75. This implies that, on average, each advancement in grade level should result in a $75 per week increase in pay rate. Once again, this exemplifies the traditional organizational structure in companies, where higher ranking employees earn higher salaries. The R2
of this equation is 82.
The R2 value displayed above, along with the equation, indicates the percentage of the pay rate variation that can be attributed to the described variables. In this particular case, with an R2 value of 82.3%, it means that 82.3% of the differences in pay rate within our dataset can be explained by our three independent variables (gender, grade, time in grade). This statistical evidence confirms a significant relationship between the pay rate and our independent variables, with a 82% confidence level.
Only 3% of the time does the S. E (Standard Error of the Estimate) equal 97.06.
The S. E. represents the range of our predictions, implying that our employees' pay rate may vary by ±$97 when predicted using this equation.
The tool is crucial as it reduces our margin for error by providing a number that may be either higher or lower than the actual number by $97.6 per week. Previously, our pay rate had a standard deviation of $229 per week. However, thanks to the regression model, we have minimized the error to $97 per week.
06. The error percentages have been reduced by approximately 58% with the regression model. We need to create separate regression models for each variable to determine the impact of each independent variable on the variation. Our initial individual regression model focuses on the gender variable. The model to predict the pay rate solely based on gender is: Pay Rate = 833 + 295 Gender Coded R2 = 36.9%.
E = $182.554/week As discussed in the initial model, gender is a qualitative variable represented as either 1 or 0. On average, the pay rate of males is $295
per week higher than that of females. In the previous regression model, this difference was only $59. By individually selecting and isolating variables, we aim to determine the specific impact of each variable on the pay rate.
Our R2 value is 36.9%, which means that 36.9% of the variation in pay rate can be accounted for by knowing the gender. Another regression model we will create considers time within a specific grade. The model to predict pay rate using just 'time in grade' as a predictor is: Pay rate = 788+82.3 (Time in Grade). The R2 value for this model is 29%.
E=$193. 734/week According to this model, the pay rate increases by an average of $82 per week for each additional year within a grade level. However, with a low R2 value of only 29%, it can be concluded that the duration of time a person has worked within a grade level has no significant impact on their pay rate.
We believe that the reason for this occurrence is the lack of consideration for the overall experience of an employee within the company. For instance, a person who has been with the company for 10 years might only have recently been promoted to grade 8, resulting in a low time within grade. Conversely, there could be someone who has been in grade 1 for the past 5 years. Clearly, their pay rate cannot solely be determined by time within grade, although it does impact the overall model to some extent.
However, the percentage of 29% is not enough evidence to support the existence of variation. Our latest regression model focuses on the grade level. Similar to before, we
have modified this variable by separating the grade level variables to create a specific variable for each grade level. We have established 8 variables with a possibility of 1 or 0. A value of 1 indicates that the individual belongs to that grade level, while a value of 0 means they are not in that particular grade level. The following is our individual grade regression model: Rate = 671 + 694 Grade_8 + 501 Grade_7 + 385 Grade_6 + 226 Grade_5 + 161 Grade_4 + 161 Grade_3 + 54.
7 Grade_2 R2=81. 8%S=$99. 3584/week. The pay rate increases in each grade relative to grade level 1, the starting grade level at the company, are shown above. Essentially, grade level 8 employees earn an average of $694 more per week than grade level 1 employees.
Compared to grade level 1 employees, grade level 4 employees receive an additional $161 per week. The regression model compares all grades to grade level 1. The R2 value is significantly higher in comparison to other individual models. Essentially, 81.8% of the pay rate can be explained by the employee's grade level. This information can be utilized by Artsy's lawyers to argue that the employee's level within the company defines the pay rate in 82% of cases.
By comparing it to the individual gender model, we can see that the 82% is much more significant than the previous finding of 36.9%. After including all the individual regression models in our analysis, we have created a new regression model that considers our grades separately. This allows us to determine if using this approach increases our R2 and decreases our Standard Error. The updated regression model
is shown below:
Rate = 632 + 46.9 Gender Coded + 26.
Grade 9, Grade 2, Grade 3, Grade 4, Grade 5, Grade 6, Grade 7, and Grade 8 are each assigned a certain numerical value. When these values are added together, the result is R2 = 85. The standard error is calculated to be $89.29 per week. This model demonstrates how the pay rate fluctuates based on each specific variable.
The gender variable indicates that men at Artsy Corporation make an average of $46.9 more per week than women. Additionally, the coefficients for grade levels are shown relative to the lowest grade level, grade level 1. The interpretations of this model remain consistent with those previously shown.
Since this model has shown us a larger R2 of 85.4% compared to our 82% from our initial model, we will continue to use the new model since it displays where the variation comes from at a more accurate level. Also another factor that shows us that our new regression model is better from our old one is our S.E. The standard error in our initial model was $97.
Our previous regression model had a standard error (S.E) of $97.06 per week. However, our new regression model has significantly improved, with an S.E of $89.29 per week. This represents a reduction in error by $7.77 or 8%. This is a positive development as it provides lawyers with more accurate information and strengthens their defense against gender discrimination.
Before we can start using our regression model, it is necessary to verify if all our variables satisfy two specific conditions: linearity and equal variance. To assess these conditions, we will examine a normal plot of
residuals and residuals versus fits plot. First, let's understand what linearity and equal variance mean. Equal variance implies that the variability in pay rates remains consistent irrespective of the independent variables' values (gender, time within grade, and grade level). Linearity refers to the direct variation of pay rates with respect to our independent variables.
The residual is simply the difference between the observed value of the sample and the predicted pay rates for a specific independent variable. Now, we will demonstrate both tests (linearity and equal variance) for each of the variables in consideration in order to support their inclusion in our final regression model. On the following page, we present the initial scatter plot showcasing the relationship between pay rate and gender: Correlation: 0.608. In this scatter plot, each circle represents an employee at the company. The circles above the number 0 represent female employees, while those above the number 1 represent male employees. Although there are only two possibilities, it is still challenging to observe linearity. However, we can discern a pattern wherein male employees tend to have higher pay rates compared to female employees.
The scatter plot above shows a slight increase. We will now assess whether this variable passes our second test, the equal variance test. Each point on the graph represents a residual. To determine if there is equal variance within the variable, we must compare the sizes of both stacks and see if they are similar. Equal variance is present when the data stacks are approximately the same size. Unequal variance occurs when one data stack is more than twice the size of the other.
Our next variable to examine is time
within grade. We will assess its linearity and equal variance. The scatterplot displayed above shows that although the points are spread out, there is a clear trend indicating that employees who spend more time within the grade have a higher pay rate. Therefore, we can conclude that this variable satisfies the linearity assumption. Additionally, we have created a scatterplot below to assess the equal variance assumption. The correlation for this variable is 0.538. Once again, since the data points in the scatterplot appear to have similar spreads, we can assume that the equal variance assumption is not violated.
This variable is accepted to run the regression model, just like the prior variable. Furthermore, we also need to test the grade variable to determine its suitability for the regression model. Similar to the previous variables, the linearity assumption and equal variance test are used to assess its acceptability. The linearity assumption test result is as follows: Correlation: 0.
876 The test above reiterates the linear relationship between the rate and grade, implying that this variable also satisfies the linearity assumption test. The following equal variance graph assesses if it also meets the assumption: Above, we observe the most recent graph on equal variance, where the stacks appear to have similar sizes. Despite some minor variations in size, there is no significant difference worth mentioning. This test also confirms linearity and equal variance. In conclusion, after meticulous analysis of the data and given variables, it is evident that numerous factors indicate gender discrimination in the workplace, as detailed in the initial section of our paper.
When analyzing the regression models in the second part, it was found that the grade
level had the most significant impact on pay rate, as indicated by its high R2 level of 81.8%. For lawyers needing to use a regression model to support their argument, it is recommended they utilize the fully constructed final model. This model has the lowest percentage of error and the highest correlation between pay rate and the provided variables. Glossary: Regression Model: Statistical technique that uses two or more numerical independent variables to predict the value of a numerical dependent variable.
We are utilizing several factors like gender, position, and length of time in the company to predict an employee's pay rate. These factors are independent variables and the pay rate is the dependent variable. Central Tendency refers to the concentration of values around a central value, while Variation measures the dispersion of values from a central value. Arithmetic Average is found by adding up a series of numbers and dividing it by the count of those numbers.
The Linear Hypothesis examines whether the rate of change in Pay Rates is consistent. Prior to creating a regression model, it is necessary to establish if there is a meaningful association between Pay rates and other variables such as gender, grade, and time in grade. If no significant relationship exists, constructing a regression model would not produce useful or accurate results.
- Company essays
- General Motors essays
- Bmw essays
- Ford Motor Company essays
- Honda essays
- Toyota essays
- Volkswagen essays
- Amazon essays
- Apple essays
- Enron essays
- Tesco essays
- Ibm essays
- Costco essays
- Kellogg essays
- Ikea essays
- Iphone essays
- Supermarket essays
- Gap essays
- Walmart essays
- Adidas essays
- Red Bull essays
- Pepsi essays
- Coca-Cola essays
- Burger King essays
- Kfc essays
- Mcdonald's essays
- Key essays
- British Airways essays
- Nokia essays
- Facebook essays
- Myspace essays
- Twitter essays
- Google essays
- Microsoft essays
- Ryanair essays
- Southwest Airlines essays
- Johnson and Johnson essays
- Sony essays
- Ebay essays
- Pepsico essays
- Starbucks essays
- Dell essays
- Intel essays
- Nestle essays
- Netflix essays
- Nike essays
- Samsung essays
- Bankruptcy essays
- Earnings essays
- Tata Group essays