Hype vs. Hyperpigmentation

Analyzed marketing language of hyperpigmentation treatments across Asian and Western skincare using EDA, hypothesis testing, and a Random Forest model revealing clear cultural differences in branding.

Python

Project Overview

Timeline

2 Months (2025)

Role

Wrote initial research question, formed hypothesis, collected data, programmed interactive graphs, and trained and tested random forest model.

Tools Used

Pandas, Numpy, NTLK, Plotly, iPywidgets

Link

Github Link

This study looks at the differences in marketing strategies and consumer perceptions of hyperpigmentation treatments in Asian and Western skincare markets. Using exploratory data analysis (EDA), machine learning classification, and hypothesis testing, we analyzed ingredient emphasis, pricing structures, and linguistic framing in product descriptions from both regions.

Research Question

How do Asian and Western skincare brands differ in their use of ingredients, product descriptions, and pricing for hyperpigmentation treatments? Specifically, how do they market five key brightening ingredients (Vitamin C, Niacinamide, Alpha Arbutin, Kojic Acid, and Tranexamic Acid), and what patterns emerge in their pricing and language? Additionally, can we build a predictive model that classifies skincare products as either Asian or Western based on these factors?

Hypothesis

We predicted that Asian skincare markets are more likely to market hyperpigmentation treatments with terms like “whitening” and “brightening,” as well as prioritize gradual improvement, whereas Western markets will market hyperpigmentation with terms like “dark spot correction” and “even skin tone” while prioritizing clinical efficiency and fast results.

We believed this is the case due to cultural differences and differences in beauty norms between Asian countries and Western countries, with a study by Columbia University showing that skin-lightening products were used at higher rates by Asian women born outside of the United States than any other demographic assessed.

Data Collection

Sephora Hyperpigmentation Skincare Product Dataset

Source

Webscraped from the Sephora website

Number of Observations

219

Number of variables

6

This dataset consists of 219 skincare products from the Sephora website, listing their brand, name, price, product description, ingredients list, and review count. This data was scraped from their list of skin care products, filtered by choosing the 'dark spots' concern under the 'Shop by concern' dropdown.


YesStyle Hyperpigmentation Skincare Product Dataset

Source

Webscraped from the YesStyle website

Number of Observations

202

Number of variables

7

The dataset consists of 202 skincare products from the Sephora website, listing their brand, name, price, product description, ingredients list, review rating (1-5) and review count. This data was scraped from the skin care product category, filtered for products that were tagged as being for 'hyperpigmentation.'

Exploratory Data Analysis

Ingredients

Niacinamide and Vitamin C are the dominant brightening ingredients in both American and Korean skincare, but Western brands emphasize Vitamin C more heavily. Alpha Arbutin and Kojic Acid are largely absent from Korean products and appear almost exclusively in Western marketing, suggesting greater popularity in the American market. Tranexamic Acid is emerging as a globally recognized ingredient. These patterns reflect clear differences in ingredient preferences and marketing strategies between the two regions.


Price Distribution

Asian skincare products are generally more affordable, with prices clustering in the lower range and fewer extreme outliers. In contrast, Western skincare products show a much broader price distribution, including several high-priced outliers exceeding $300 from luxury brands like La Mer, Dr. Barbara Sturm, and Westman Atelier. The wider interquartile range of Western products highlights greater price variability and diverse positioning strategies, while Asian brands maintain tighter, lower-cost distributions that emphasize accessibility and cost-effective formulations. These differences are likely influenced in part by platform bias, as Sephora primarily features premium Western brands, whereas YesStyle markets itself as a destination for inexpensive Asian cosmetics, potentially amplifying the observed pricing gap.


Marketing Language

The word frequency analysis revealed clear differences in how Western and Asian skincare brands market their products. Western brands tend to focus on results and transformation, frequently using words like “dark,” “spots,” “visibly,” “clinical,” “wrinkles,” and “lines.” This language emphasizes problem-solving, visible changes, and a science-backed approach designed to promise fast, noticeable improvements. In contrast, Asian skincare brands highlight gentleness, hydration, and ingredient transparency. Common words such as “extract,” “formula,” “moisture,” “sensitive,” “hyaluronic,” and “niacinamide” suggest a focus on nourishing, soothing products suitable for different skin types rather than dramatic claims. This indicates that Western marketing leans toward clinical efficacy and transformation, while Asian marketing prioritizes safety, hydration, and long-term skin health.

Hypothesis Testing

Our hypothesis tests compared word usage between Western and Asian skincare product descriptions to see if the differences were statistically significant. Using a Chi-Square Test of Independence on the word frequency data, the analysis returned a chi-square statistic of 3475.60 with a p-value of 0.0, indicating a highly significant difference in vocabulary patterns. This result confirms that Western and Asian brands use distinct marketing language rather than random variation. Western descriptions more often include terms tied to clinical efficacy and visible results, while Asian descriptions emphasize gentle care and ingredient-focused messaging. These linguistic differences align with cultural preferences and marketing strategies in the two regions.

Random Forest Classification

We built a random forest classifier to predict whether a skincare product was from an Asian or Western brand using its product description text, ingredients, and price. After transforming the text with TF-IDF vectorization and standardizing prices, the model was trained on an 80/20 train-test split. It achieved 95.31% accuracy, with high precision and recall for both categories. The model performed slightly better on Western products (perfect recall) than Asian products (recall of 0.89), suggesting a few Asian products were misclassified as Western. These results confirm that differences in marketing language, ingredient emphasis, and pricing are strong enough for machine learning to reliably distinguish between the two markets.

Conclusion

Our analysis showed clear differences between Asian and Western skincare marketing. Asian brands emphasized gentle, long-term benefits and affordable pricing, while Western brands highlighted clinical efficacy, fast results, and premium positioning. Ingredient mentions, pricing distributions, and word usage all reflected these distinct strategies, and the Random Forest model confirmed these patterns, accurately classifying products by region.

However, there are key limitations. The dataset relied on English-language descriptions from two primary platforms (Sephora and YesStyle), which may bias results toward their formatting and brand selection. Non-English markets, additional countries, and consumer sentiment analysis were not deeply explored. Future work could expand the dataset, include more regions and languages, and refine the model to address the misclassification of Asian products.


Create a free website with Framer, the website builder loved by startups, designers and agencies.