<aside> 🎯 Exploring and Analysing the dataset using Pandas & Plotly skills.
</aside>
Kaggle Dataset, created by IBM
The data consists of coffee shop transactions in April 2019.
There are ~50,000 total transactions in the data set. The data consists of transactions from shops in New York City only.
https://drive.google.com/file/d/1W9WTlLz2F2C-O17L71MpSZduWd0LF0UD/view?usp=drive_web
product_margin = coffee_df.groupby('product_type', as_index=False)['total_profit2'].agg('mean')
product_margin = product_margin.sort_values(by='total_profit2', ascending=False).head(10)
bar = px.bar(product_margin, x='product_type', y='total_profit2'
,title = 'Clothing Product offers the best margin'
,opacity=0.4
,color_discrete_sequence = ['red']
,labels={'product_type': 'Product', 'total_profit2': 'Margin'}
)
bar.show()
Product_sales = coffee_df.groupby('product_type', as_index=False)['line_item2'].count()
Product_sales = Product_sales.sort_values(by='line_item2', ascending=False).head(10)
bar = px.bar(Product_sales, x='product_type', y='line_item2'
,title = 'The Brewed Chai Tea has been the most sold Item'
,opacity=0.5
,color_discrete_sequence = ['orange']
,labels={'product_type': 'Product', 'line_item2': 'Count of Sales'}
)
bar.show()
Product_sales2 = coffee_df.groupby('product_type', as_index=False)['line_item2'].sum()
Product_sales2 = Product_sales2.sort_values(by='line_item2', ascending=False).head(10)
bar = px.bar(Product_sales2, x='product_type', y='line_item2'
,title = 'But the Barista Espresso generates the most Revenue'
,opacity=0.4
,color_discrete_sequence = ['green']
,labels={'product_type': 'Product', 'line_item2': 'Count of Sales'}
)
bar.show()
# Part 1 - How much Customer infomation do we have?
print(coffee_df['customer_generation'].info())
print('------------------------')
print(round(24852/49894*100,2),'% of data missing')
# Out of 49894, only 24852 customer have their generation specified (49.81 %).
## Data is having limitations here