Explore

Analyzing Experiments Solutions

An experiment is a type of study in which a researcher:
- Establishes control and treatment (experimental) groups
- Designs a plan to manipulate one variable in the treatment group that intentionally changes a behavior
The treatment group receives the treatment or is asked to change their behavior .
The control group does not receive the treatment, or may receive a placebo if the control and treatment groups are anonymous.
In inferential statistics, an experiment must be analyzed for significance to determine if it is successful.
Through analysis, researchers:
- Compare the treatment and control groups
- Draw conclusions and make recommendations about the group(s)
- Apply conclusions and inferences about the population

Simulation

Because it is not practical to repeat live experiments over and over, simulations are used to essentially shuffle, reorganize, and reassign data, ensuring random chance.
A randomized simulation asks this question (about your live experiment results): “Is this a real result, or was it a coincidence?”
The question is answered using complex simulation technology to:
- Recombine all of the data together
- Randomly re-assign (sort, shuffle) the individuals into the same-sized treatment and control groups as the original experiment
- Calculate observed differences between the groups
- Run the simulation 10,000 times to get a reliable standard deviation

Scenario-Based Tasks

A scenario-based task is a real-life situation or story requiring further analysis.
You will analyze the scenario to solve a problem and/or make recommendations.
Because the tasks often have more than one answer, you must explain your reasoning and justify your decisions.
For scenario-based tasks:

1. Calculate

- - The observed difference:
    $observed difference = |treatment mean - control mean|$
  - The z-score for the observed difference:
    $z = \frac{observed difference}{simulated standard deviation}$

1. Interpret the z-score by answering, “What does this mean?”

z-Value	Significance Level	Meaning
$\| z \| < 1.96$	Not significant	Could occur by chance
$\| z \| > 1.96$	5% (a.k.a. 95% certainty)	Statistically significant

Remember, the percentages are not used in the calculations, but they help you see if the results are unusual (significant).

1. Decide and justify (so now what?)

- - Statistical significance asks: “Is this real or due to chance?”
  - Practical significance asks: “Does this matter?”
  - Cost asks: “Is the cost worth it?”
  - Context asks: “What else, if anything, matters?”
  - Justify your thinking using data in a written summary .

Example 1

Medication Trial

A pharmaceutical lab tested a new sleep medication against a placebo. Forty volunteers with insomnia (trouble sleeping) were randomly assigned to either the medication or the placebo group. Each participant recorded the number of hours they slept each night for a week. The observed difference between the two groups was 1.40 hours.

After the experiment, technology simulated 10,000 trials, resulting in a standard deviation of 0.37 hours.

Group n = 20	Average Hours of Sleep
Medication	7.33
Placebo	5.93

Calculate the z-score for the observed difference. Explain if you think the medication is effective.
If the medication was not effective, what would you expect the results to be?
Headline: “Amazing Sleeping Pill Guarantees 8 Hours of Rest! Study Confirms medication increases sleep 140%!” Is the percentage in the headline accurate or misleading? Explain.
What is not mentioned in the headline that should be reported?

$z = \frac{1.40}{0.37} = 3.784$ Yes, the medication is effective because $3.784 > 1.96,$ which is statistically significant at the 5% level.
If the medication had no effect, the observed difference would fall near the center of the distribution (be approximately zero).
The percentage is misleading because it does not state what number was used to calculate a 140% increase. It is mathematically wrong to take 1.4 hours and multiply by 100 to get a percent.
The study only used 40 volunteers, and they all reported having insomnia (so it might not improve sleep quality for people without insomnia). The study did not define the number of days in a week (5 or 7).

Example 2

Grocery Store Checkout

The Food-Mart grocery store chain wanted to know if it should add a new self-checkout system to its stores. During peak hours, 600 customers were randomly assigned to self-checkout or a cashier. Then, simulation technology was used to complete 10,000 randomized tests for checkout time based on the results of the experiment. The standard deviation of the checkout time was 0.34 minutes, and the error rate standard deviation was 2.78 percentage points.

Group n = 300	Checkout Time (minutes)	Error Rate
Treatment: self-checkout	3.2	12%
Control: cashier	4.1	3%

Business considerations:

One self-checkout system costs $45,000.
Cashiers earn $16 per hour and handle an average of 15 customers per hour.
Each error at self-checkout requires 5 minutes of staff time to resolve at a rate of $20 per hour.
The stores average 2,000 customers daily.

Calculate the checkout time z-scores for the observed difference. Explain the significance at a 5% level.
Calculate the error z-score for the observed difference. Explain the significance.
What other factors beyond statistics and cost should be considered?

Checkout time: $z = \frac{0.9}{0.34} = 2.647$
The checkout time z-score is $> 1.96,$ which is statistically significant at the 5% level. This means the self-checkout rate is faster than a cashier.
Error rate: $z = \frac{9}{2.78} = 3.237$
The error rate z-score is $> 1.96,$ which is statistically significant at the 5% level. This means there are significantly more errors made at self-checkout than by a cashier.
Other factors to consider: customer satisfaction/preference, employee breaks and time off, store hours (and many more)

Example 3

Farm Fertilizer

A farming cooperative selected 80 plots of the same size with the same conditions to test a new organic fertilizer. The results were run through a randomized simulator 10,000 times, resulting in a standard deviation of the simulated differences of 4.15 bushels. Cost considerations: Organic fertilizer costs an additional $45 per acre compared to standard fertilizer. The current market price is $6.50 per bushel.

Group n = 40	Mean Yield in Bushels per Acre
Treatment: new organic fertilizer	168.3
Control: standard fertilizer	162.7
Observed difference	5.6

Calculate the z-score for the observed difference. Explain the statistical significance at the 5% level.
Calculate the expected financial benefit per acre for a yield increase. Would switching to the organic fertilizer be profitable?
As a decision maker for the farm co-op, what recommendations would you make? (Consider statistical and economic factors.)

$z = \frac{5.6}{4.15} = 1.349$ Because $1.349 < 1.96,$ it is not statistically significant at the 5% level. This means that the organic fertilizer did not significantly impact the mean yield.

B)	Treatment	$168.3 ($ 6.50) = $ 1,093.95$	One acre would earn $36.40 more per acre, but the new fertilizer costs an additional $45 per acre. This means there is a loss of $8.60 per acre.
	Control	$162.7 ($ 6.50) = $ 1,057.55$
		+$36.40/ acre

It is recommended that the farm co-op continue to use the standard fertilizer. The reason being, the new fertilizer did not yield significant results, but costs significantly more per acre, resulting in an $8.60 loss per acre.