Introduction:

Olist is a Brazilian e-marketplace with an impressive growth rate, I was tasked with an EDA and distill 3 insights about it .

Drawing from my own experience in companies that manage their own marketplaces I decided to look at the data set from 2 angles - Seller side and Customer(buyer) side. I used Python with Jupyter Notebook and imported Pandas, Numpy and other libraries to help with the formatting and visualization

The data set covers 773 days or almost 26 months as the period.

Olist - Digital Marketplace Analysis

Task

Exploratory Data Analysis : I identified 3095 sellers and decided to look at their distributions by sales, category, and State(geography).

Seller's Side

By sales: I joined the orders table with the order_itmes table. My main idea was to see if there was a clear segmentation or tiers in the sellers group. The Average Revenue per Seller graph summarizes the seller’s distribution, very few sellers have a very high revenue. The same skewedness can be seen in the Seller’s Order Count graph and Seller’s overall Revenue. There was no Tier segmentation that could be easily identified.

By Category: Here I joined orders, order_itmes products table and the category_tranlations table. By calculating the Average Revenue per Category I found that Computers has revenues over 5 times than most of the other categories. But by looking at the overall revenue by category the top 5 categories dwarf over the rest where Health and Beauty is the top most. And Computers category is not even near the top. Just to be certain about this pattern I wanted to see how the sellers are spread over all the categories, this can be seen in the Seller’s per category graph. Here I find that most Sellers are found in Health and Beauty, sports and leisure and houseware categories. And furthermore, I found that a seller is more likely to be focused on a single category rather than be spread over several.

By State:I found the top selling categories per state, Health and Beauty is still the most popular.

Customer(buyer) side

My idea was to understand the quantity and location of the demand and see how it behaved within the period of the dataset.

From the Customers table I found out that Sao Pablo and Rio de Janeiro with their corresponding States have the bulk of buyers.

Then I calculated the Monthly Average Users and Daily Average Users.

From the MAU we can see how the number of buyers rises and then stabilizes somewhat. This means that there is a steady stream of new buyers.

From the DAU graph we see a slow but significant rise of daily buyers.

Next, I wanted to see if the same customers return to the site and order more. The customers were separated by monthly cohorts and then I calculated the retention rate. From the Retention Rate Heatmap a troublesome fact arrives, the site has a very low retention rate, this means that the marketplace is mostly reliable on new buyers.

Retention Rate by Cohort:

Conclusions

  1. Sellers are not tiered, the marketplace is skewed toward big sellers, sellers focus on specific categories.

  2. Buyers are concentrated geographically in Sao Pablo and Rio de Janeiro (cities and states), the marketplace has a healthy stream of new buyers, but it doesn’t succeed in retaining them.

  3. Considering that the marketplace relies on new buyers whom most likely spend in reasonable amounts and most likely will pay on 1 installments we can conclude that buyers are reliable.

There is a lot of potential this marketplace, it grows at a steady rate(mau), buyers are reliable clients who pay mostly by credit card and sellers know how to align themselves with the cultural nuances of the Brazilian market – making health and beauty the category with the most revenue.

The data is very well curated, and it was impressive to find a very low relative quantity of products without a category. I would recommend further analysis into creating Tiers of sellers, which would convey reliability and experience with new and returning buyers.

As a growth strategy based on this dataset, I would recommend to expand the seller’s base and at the same time helping bottom sellers increase their market share. This would seem contradictory as the competition would rise, but in a digital marketplace a recommendation engine might help in lowering down the friction between sellers.

The low retention rate can be considered as symptom of a young marketplace which might be reliable on worth of mouth at some rate but with time marketing in general might help improve it.