Situation
As an analyst at a big online store and together with the marketing department, a compiled list of hypotheses has been made that may help boost revenue. These hypotheses need to be prioritized ,we need to launch an A/B test and analyze the results.
Task
Download the data and prepare it for analysis
Load the data on ‘visits’, ‘orders’, and ‘expenses’ in variables.
Optimize the data for analysis. Make sure each column contains the correct data type.
Action
Raw data was stored in csv format : ‘visits’, ‘orders’, and ‘expenses.
Using Jupyter notebook I loaded the data, cleaned and optimized it for analysis
Used numpy, pandas and scipy libraries to analyze all sets
Applied A/B testing theory to find the statistical significance of the conversion and the average order size
A/B Test
Result
Group B has a higher cumulative revenue , higher cumulative average order size and much better conversion rate
While looking for anomalies I found out that users with more than 2 orders with a price of $435.54 may be considered outliers
By testing the relative gain for conversion rate group B is 16% higher than A
For the significance of the difference in average order size the p_value is 0.431 (we cant reject the null hypotheses),therefore the we can't conclude that the order size differ between A and B
For the relative gain by average order size group B is 27% better than A
Reflection
After removing the ouyliers we see how group A's relative gain by average order size is better than B's.
But we still can't conclude that A and B differ by average order size(per the p_values)
The relative gain for conversion rate is much better for Group B in the Raw and Filtered data analysis.
If we consider the conversion rate to be important then we can consider this test a success since Group B performs much better