<aside> 🎯 Exploring and Analysing the dataset using Pandas & Plotly skills.

</aside>

Dataset

Kaggle Dataset, created by IBM

The data consists of coffee shop transactions in April 2019.

There are ~50,000 total transactions in the data set. The data consists of transactions from shops in New York City only.

Notebook

https://drive.google.com/file/d/1W9WTlLz2F2C-O17L71MpSZduWd0LF0UD/view?usp=drive_web

Deliverable

  1. Which products offer the best margins?
product_margin = coffee_df.groupby('product_type', as_index=False)['total_profit2'].agg('mean')
product_margin = product_margin.sort_values(by='total_profit2', ascending=False).head(10)

bar = px.bar(product_margin, x='product_type', y='total_profit2'
                ,title = 'Clothing Product offers the best margin'
                ,opacity=0.4
                ,color_discrete_sequence = ['red']
                ,labels={'product_type': 'Product', 'total_profit2': 'Margin'}
             )
bar.show()

Screenshot 2024-07-02 at 19.44.12.png

  1. Which products generate the most sales?
Product_sales = coffee_df.groupby('product_type', as_index=False)['line_item2'].count()
Product_sales = Product_sales.sort_values(by='line_item2', ascending=False).head(10)

bar = px.bar(Product_sales, x='product_type', y='line_item2'
                ,title = 'The Brewed Chai Tea has been the most sold Item'
                ,opacity=0.5
                ,color_discrete_sequence = ['orange']
                ,labels={'product_type': 'Product', 'line_item2': 'Count of Sales'}
             )
bar.show()

Screenshot 2024-07-02 at 19.30.34.png

Product_sales2 = coffee_df.groupby('product_type', as_index=False)['line_item2'].sum()
Product_sales2 = Product_sales2.sort_values(by='line_item2', ascending=False).head(10)

bar = px.bar(Product_sales2, x='product_type', y='line_item2'
                ,title = 'But the Barista Espresso generates the most Revenue'
                ,opacity=0.4
                ,color_discrete_sequence = ['green']
                ,labels={'product_type': 'Product', 'line_item2': 'Count of Sales'}
             )
bar.show()

Screenshot 2024-07-02 at 19.34.30.png

  1. Are particular customer groups spending more?
# Part 1 - How much Customer infomation do we have?
print(coffee_df['customer_generation'].info())
print('------------------------')
print(round(24852/49894*100,2),'% of data missing')
# Out of 49894, only 24852 customer have their generation specified (49.81 %).
## Data is having limitations here