When a prominent UK university sought to deepen its understanding of the global fragrance market, they turned to Statistique for comprehensive data acquisition, cutting-edge analytics, and academic research collaboration. By blending Natural Language Processing (NLP), fuzzy matching algorithms, and robust data governance, our team helped the university push the boundaries of computational social science—showcasing new insights and innovations in how fragrances are developed, marketed, and perceived.
Project Overview
- Client: Academic stakeholders and computational researchers at a UK-based institution
- Objective: Explore the fragrance industry by scraping, integrating, and analyzing multi-source data—emphasizing custom NLP models, statistical analysis, and interactive data visualization
- Scope: End-to-end support, from requirement gathering and feasibility studies to advanced modeling, fuzzy matching, and results dissemination via academic conferences and stakeholder dashboards
Key Components of the Solution
1. Data Acquisition & Governance
- Automated Scraping: Leveraged tools like
requests
and Selenium to pull real-time product details, consumer reviews, and social media mentions. - Compliance & Security: Aligned data practices with the UK’s Data Protection Act and GDPR, ensuring secure handling of sensitive user-generated content.
2. Data Linkage & Integration
- Fuzzy Matching Algorithms: Achieved a remarkable 98.6% accuracy in merging disparate product catalogs and consumer databases, unifying thousands of records under consistent naming and classification.
- Scalable Architecture: Employed PostgreSQL and data-lake storage solutions to handle ever-growing data volumes without compromising performance.
3. Advanced NLP & Sentiment Analysis
- Transformers (BERT, DeBERTa v2): Created word embeddings to classify fragrance notes and gauge consumer sentiment toward new or niche products.
- Llama 2 for JSON Extraction: Streamlined the parsing of complex text blurbs from user forums and professional reviews, converting them into structured JSON for in-depth analysis.
- Clustering & Summarization: Applied K-means to group similar feedback, followed by automated summaries using BART-CNN—translating dense text into actionable insights.
4. Statistical & Network Analysis
- Comprehensive Toolset: Deployed pandas, NumPy, Statsmodels, and SciPy for everything from hypothesis testing and regression to time series analysis.
- Similarity Metrics & Novelty Scores: Designed custom statistical frameworks to assess innovation in fragrance formulas, identifying top-performers based on unique scent compositions.
- Network Analysis: Leveraged NetworkX to reveal key influencers, collaborative patterns between perfumers, and the interconnected dynamics of consumer communities.
5. Data Visualization & Reporting
- Interactive Dashboards: Built user-friendly interfaces with Plotly and Tableau for real-time exploration of trends, notes, and market behaviors.
- Academic Dissemination: Presented findings at conferences through engaging charts and summaries, while also offering non-technical reports and executive dashboards to broader audiences.
Results & Impact
- Enhanced Industry Insight: By integrating data from multiple sources—product catalogs, consumer reviews, social media chatter—researchers gained unprecedented visibility into the fragrance ecosystem.
- Academic & Commercial Value: The solution informed theoretical models on consumer behavior and innovation, while delivering practical intelligence for fragrance brands seeking a competitive edge.
- Robust & Reliable Analytics: Achieving 98.6% accuracy in data integration minimized inconsistencies and boosted trust in subsequent statistical inferences and modeling efforts.
- Effective Stakeholder Communication: Intuitive dashboards and clear, concise reports ensured both technical and non-technical stakeholders could grasp and apply the findings to real-world challenges.