Week 2 Project
1. Cleaning Dataset:
1.1. Initial Inspection and Data Cleanup:
- Duplicate records: The dataset contains two rows with highly similar fields, warranting further analysis for potential duplication.
- Inconsistent date formats: Date entries exhibit variations in formatting.
- Missing fields: Several customer email and phone number fields are absent.
1.2. Data Type Formatting:
- The “Order_Date” column was formatted with consistency.
- The column “Email” has the missing value -> manually use function VLookup -> find and replace if the similar value exists in the same dataset. -> result: the dataset is so small that does not have the reference.
- The column “Phone” has the missing value -> formatting the whole column’s data type into the numeric type. -> VLookup -> find and replace if the similar value exists in the same dataset. -> result: the dataset is so small that does not have the reference.
- Still have a few leftover “Null” cells in the dataset -> analyzing -> review -> decide to let it be because it is limited information.
1.3. Export: The cleaned dataset was exported as a CSV file.
1.4 Cleaned dataset sample:
Order_ID | Customer_Name | Phone | Product_Category | Order_Date | Revenue | Discount (%) | |
---|---|---|---|---|---|---|---|
101 | John Doe | john@email.com | 9876543210 | Electronics | Sunday, December 31, 2023 | 1200 | 10 |
102 | Alice Smith | 9898989898 | Clothing | Friday, January 5, 2024 | 500 | 0 | |
103 | Bob Miller | bob@email.com | Electronics | Friday, January 12, 2024 | 3000 | 20 | |
104 | John Doe | john@email.com | 9876543210 | Electronics | Sunday, December 31, 2023 | 1200 | 10 |
105 | David White | david@email.com | 9123456789 | Furniture | Thursday, February 15, 2024 | 2500 | 15 |
106 | Emma Brown | emma@email.com | 9234567890 | Clothing | Friday, March 8, 2024 | 700 | 5 |
107 | Chris Green | 9345678901 | Furniture | Wednesday, April 10, 2024 | 1800 | 25 | |
108 | Alice Smith | alice@email.com | Clothing | Friday, March 8, 2024 | 500 | 0 |
2. Insights:
2.1 Customer Activity:
- The dataset comprises five unique customers, with ‘John Doe’ and ‘Alice Smith’ appearing twice, indicating repeat patronage.
- ‘John Doe’ has orders with Order_ID 101 and 104, both for ‘Electronics’ with a 10% discount and a revenue of 1200. These entries may represent duplicate records or identical orders.
- ‘Bob Miller’ has missing email and phone number information, which could impede customer communication.
2.2 Product Performance:
- ‘Electronics’ is a prominent product category, accounting for three out of eight orders.
- ‘Electronics’ generated the highest revenue, totaling $5400 across the recorded orders.
- ‘Clothing’ and ‘Furniture’ also contributed to revenue, with ‘Clothing’ generating $500 and ‘Furniture’ generating $2500.
2.3 Sales Trends and Discounts:
- The ‘Order_date’ column spans from ‘2023-12-31’ to ‘2024-02-20’, indicating sales activity across multiple months.
- The average discount applied across all orders is approximately 10%.
- The highest discount, 20%, was applied to an ‘Electronics’ order placed by ‘Bob Miller’, which also had the highest individual revenue of $3000.
- The relationship isn’t straightforward – higher discounts don’t automatically mean higher revenue. -> Because two orders (Alice Smith’s Clothing purchases) had no discount and generated $500 each, it is proved that discounts aren’t always necessary for conversions.
3. Visualizations:
3.1 Summary Revenues:

3.2 Relationship between Revenues and Discounts:

3.3 Sales Tracking:

Recommendations
- Implement a data validation process at the point of order entry. For existing records, reach out to customers to complete their contact information. This is crucial for marketing campaigns and customer retention efforts.
- Consider focusing marketing efforts on Electronics while investigating why Furniture has such variable discounting (15-25%). -> Consider implementing tiered discount strategies by product category rather than blanket discounting. -> Electronics and Furniture categories can support 15-20% discounts while maintaining high revenue, whereas Clothing appears to sell at lower price points regardless of discount level.
- The lower sales figures observed in February and April of the current year offer valuable insights for future marketing strategy modifications by implementing the A/B test: An A/B test will be conducted: Test A will offer a bundle of Electronics and Clothing, while Test B will offer the same bundle with an additional 10% discount.
Proposal:
