Situation

As an analyst at a big online store and together with the marketing department, a compiled list of hypotheses has been made that may help boost revenue. These hypotheses need to be prioritized ,we need to launch an A/B test and analyze the results.

Task

  • Download the data and prepare it for analysis
  • Load the data on ‘visits’, ‘orders’, and ‘expenses’ in variables.

  • Optimize the data for analysis. Make sure each column contains the correct data type.

Action

  • Raw data was stored in csv format : ‘visits’, ‘orders’, and ‘expenses.

  • Using Jupyter notebook I loaded the data, cleaned and optimized it for analysis

  • Used numpy, pandas and scipy libraries to analyze all sets

  • Applied A/B testing theory to find the statistical significance of the conversion and the average order size

A/B Test

Result

  • Group B has a higher cumulative revenue , higher cumulative average order size and much better conversion rate

  • While looking for anomalies I found out that users with more than 2 orders with a price of $435.54 may be considered outliers

  • By testing the relative gain for conversion rate group B is 16% higher than A

  • For the significance of the difference in average order size the p_value is 0.431 (we cant reject the null hypotheses),therefore the we can't conclude that the order size differ between A and B

  • For the relative gain by average order size group B is 27% better than A

Reflection

  • After removing the ouyliers we see how group A's relative gain by average order size is better than B's.

  • But we still can't conclude that A and B differ by average order size(per the p_values)

  • The relative gain for conversion rate is much better for Group B in the Raw and Filtered data analysis.

  • If we consider the conversion rate to be important then we can consider this test a success since Group B performs much better