A well-known UK university turned to Statistique for thorough data collecting, innovative analytics, and academic research cooperation when it wanted to better grasp the worldwide fragrance business. Our team helped the institution challenge the limits of computational social science by combining Natural Language Processing (NLP), fuzzy matching algorithms, and strong data governance—showcasing fresh insights and breakthroughs in how scents are made, marketed, and perceived.
Project Overview
- Client: Computational researchers and academic stakeholders from a UK-based university.
- Objective: Emphasizing bespoke NLP models, statistical analysis, and interactive data visualization, explore the fragrance industry by scraping, integrating, and analyzing multi-source data.
- Scope: From requirement collecting and feasibility studies to advanced modeling, fuzzy matching, and results distribution via academic conferences and stakeholder dashboards, scope: end-to-end support.
Key Components of the Solution
1. Data Acquisition & Governance
- Automated Scraping: Leveraged tools like request and Selenium to pull real-time product details, consumer reviews, and social media mentions.
- Compliance & Security: Aligned data practices with the UK’s Data Protection Act and GDPR, ensuring secure handling of sensitive user-generated content.
2. Data Linkage & Integration
- Fuzzy Matching Algorithms: Achieved a remarkable 98.6% accuracy in merging disparate product catalogs and consumer databases, unifying thousands of records under consistent naming and classification.
- Scalable Architecture: Employed PostgreSQL and data-lake storage solutions to handle ever-growing data volumes without compromising performance.
3. Advanced NLP & Sentiment Analysis
- Transformers (BERT, DeBERTa v2): Created word embeddings to classify fragrance notes and gauge consumer sentiment toward new or niche products.
- Llama 2 for JSON Extraction: Streamlined the parsing of complex text blurbs from user forums and professional reviews, converting them into structured JSON for in-depth analysis.
- Clustering & Summarization: Applied K-means to group similar feedback, followed by automated summaries using BART-CNN—translating dense text into actionable insights.
4. Statistical & Network Analysis
- Comprehensive Toolset: Deployed pandas, NumPy, Statsmodels, and SciPy for everything from hypothesis testing and regression to time series analysis.
- Similarity Metrics & Novelty Scores: Designed custom statistical frameworks to assess innovation in fragrance formulas, identifying top-performers based on unique scent compositions.
- Network Analysis: Leveraged NetworkX to reveal key influencers, collaborative patterns between perfumers, and the interconnected dynamics of consumer communities.
5. Data Visualization & Reporting
- Interactive Dashboards: Built user-friendly interfaces with Plotly and Tableau for real-time exploration of trends, notes, and market behaviors.
- Academic Dissemination: Presented findings at conferences through engaging charts and summaries, while also offering non-technical reports and executive dashboards to broader audiences.
Results & Impact
- Improved Industry Insight: Researchers had until unheard-of access to the fragrance ecosystem by combining data from many sources—product catalogs, user reviews, social media chatter.
- Academic & Commercial Value: While providing pragmatic intelligence for fragrance companies looking for a competitive edge, the solution guided theoretical models on consumer behavior and innovation.
- Robust & Reliable Analytics: Achieving 98.6% correctness in data integration reduced discrepancies and increased confidence in next statistical inferences and modeling initiatives, hence strengthening robust and reliable analytics.
- Effective Stakeholder Communication: Clear, succinct reports and intuitive dashboards guaranteed both technical and non-technical stakeholders could understand and apply the results to practical problems.
Why Statistique?
At Statistique, we shine in combining data governance, advanced analytics, and NLP to handle challenging, multidisciplinary projects. Our established success in academic cooperation shows that we can link strong approaches with easily available presentations to create solutions that appeal to end users, corporate partners, and researchers equally.
Want to turn your efforts driven by data into significant results? Get in touch to find out how Statistique could support in any field innovation, education, and inspiration.



