Taylor Swift: Danceability, Valence, and Tempo by Era

This dataset contains audio features for Taylor Swift’s discography, sourced from Spotify’s API via the #TidyTuesday open data project. Spotify automatically generates scores for each track measuring characteristics like how danceable (ranges from 0 - least danceable to 100 - most danceable), emotionally negative or positive (ranges from 0 - low valence, sounds more negative… sad, depressed, angry to 100 - high valence, sounds more positive… happy, cheerful, euphoric), or fast a song is. We use these scores to explore how her musical style has evolved across her three career eras: early (2006-2012), mid (2013-2019), and late (2020-2022). The dataset includes 263 songs.

Descriptive Statistics

  1. First, it’s important we run some preliminary analysis to better understand our data. Interpret the appropriate descriptive statistics (e.g., M, SD, skewness, kurtosis) for danceability, valence, and tempo. What do you notice about these descriptive statistics across eras?
Table 1. Descriptive Statistics: Danceability, Valence, and Tempo
vars n mean sd median trimmed mad min max range skew kurtosis se
danceability 1 263 58.66 11.04 59.80 58.75 9.79 29.20 89.70 60.50 -0.11 0.04 0.68
valence 2 263 40.46 19.55 39.90 39.69 21.05 3.82 94.20 90.38 0.33 -0.49 1.21
tempo 3 263 125.15 31.23 122.88 123.96 34.08 57.96 208.92 150.95 0.37 -0.43 1.93
Table 2. Descriptive Statistics by Era
variable era n mean sd median skew kurtosis se
danceability early 87 58.16 9.59 59.20 0.10 -0.16 1.03
danceability mid 58 63.57 12.65 64.65 -0.48 -0.11 1.66
danceability late 118 56.60 10.54 58.20 -0.29 0.15 0.97
valence early 87 41.57 20.81 37.90 0.40 -0.82 2.23
valence mid 58 41.63 19.31 42.15 0.30 -0.22 2.54
valence late 118 39.06 18.76 39.75 0.24 -0.52 1.73
tempo early 87 123.33 29.85 124.91 0.36 0.01 3.20
tempo mid 58 129.94 35.77 123.15 0.19 -1.07 4.70
tempo late 118 124.14 29.86 121.54 0.41 -0.42 2.75

Across all songs, danceability was relatively consistent (M = 58.66, SD = 11.04) and approximately normally distributed. Valence showed greater variability (M = 40.46, SD = 19.55), suggesting Taylor Swift’s catalog skews emotionally negative. By era, mid-era songs were notably more danceable (M = 63.57) compared to early (M = 58.16) and late (M = 56.60) eras.

Histograms

  1. Let’s now look at the distribution of our continuous variables and see if they’re normally distributed.

All three variables are approximately normally distributed. Danceability and valence are slightly right-skewed, while tempo shows a wider spread, reflecting the diversity of musical styles across Taylor Swift’s catalog.

Box Plots

  1. Create box plots of our continuous variables (danceability, valence, tempo) to visually summarize the distribution of data, showing its center (median), spread (variability), and skewness at a glance. Do you notice any potential outliers? Here’s a link on how to interpret box plots if you need a refresher.

Danceability scores cluster tightly around 60 out of 100, with just three songs falling well outside the typical range. Valence, or a measure of how positive or negative a song sounds, varies much more widely, reflecting the emotional range across Taylor Swift’s catalog. Tempo is fairly consistent, with most songs falling in a moderate speed range around 120 BPM.

  1. Box plots are especially useful for comparing distributions across multiple groups. Graph danceability, valence, and tempo by era. What do you notice?

Mid-era songs tend to be the most danceable, while late-era songs score slightly lower on average. That said, all three eras overlap considerably, so the differences are modest. Emotional tone, as measured by valence, stayed remarkably consistent across her career, where early, mid, and late era songs all lean slightly negative on average. Song tempo shows a similar pattern, with no meaningful differences across eras.

Scatter Plots

  1. Let’s graph some scatter plots and examine the relationships between our continuous variables. We’ll also examine scatter plots by looking at era. What trends can we describe in terms of direction (i.e., positive, negative, or none), form (i.e., linear, curved, or no pattern), and strength (i.e., weak, moderate, or strong)?

Valence and danceability show a weak positive relationship, where happier-sounding songs tend to be slightly more danceable. Tempo shows a moderate negative relationship with danceability, suggesting that faster songs actually tend to be rated as less danceable. Valence and tempo show little to no relationship with each other. These patterns hold across all three eras with no notable differences by group.

Correlation Table

  1. Let’s create a correlation table of danceability, valence, and tempo to supplement the conclusions that we observed in our scatter plots.
Table 3. Correlations Among Danceability, Valence, and Tempo
danceability valence tempo
danceability 0.32** -0.31**
valence -0.01
tempo
Note:
* p < .05. ** p < .01.

Valence and danceability show a small but statistically significant positive correlation (\(r\) = .32, \(p\) < .01), confirming that happier-sounding songs tend to score higher on danceability. Tempo and danceability show a moderate negative correlation (\(r\) = -.31, \(p\) < .01), meaning faster songs are actually somewhat less danceable on average. Valence and tempo are essentially unrelated (\(r\) = -.01, \(p\) > .05).

Chocolate Bar Ratings by Cocoa Percentage

This dataset comes from Flavors of Cacao, a database of expert chocolate bar reviews compiled since 2006. Each entry includes a rating score from 1 to 4, where higher scores indicate better quality. We focus on bars with cocoa content of 65%, 70%, 75%, and 80% to examine whether cocoa percentage is associated with perceived quality. Cocoa percentage refers to sum of the cocoa bean and any added cocoa butter content; a higher percentage (e.g., 80%) indicates less cocoa butter (e.g., 20%). To ensure a fair comparison across groups, we sampled an equal number of bars from each cocoa percentage level (n = 89 per group, N = 356 total).

Descriptive Statistics

  1. It’s important we run some preliminary analysis to better understand our data. Interpret the appropriate descriptive statistics (e.g., M, SD, skewness, kurtosis) for chocolate bar ratings across cocoa percentage groups.
Table 3. Descriptive Statistics: Chocolate Bar Rating
vars n mean sd median trimmed mad min max range skew kurtosis se
Cocoa Rating 1 356 3.19 0.44 3.25 3.2 0.37 1.5 4 2.5 -0.39 0.2 0.02
Table 4. Descriptive Statistics by Cocoa Percentage
vars n mean sd median trimmed mad min max range skew kurtosis se
65% 1 89 3.17 0.43 3.25 3.20 0.37 1.5 4.00 2.50 -1.03 1.61 0.05
70% 1 89 3.33 0.46 3.25 3.34 0.37 2.0 4.00 2.00 -0.27 -0.58 0.05
75% 1 89 3.17 0.47 3.25 3.20 0.37 2.0 4.00 2.00 -0.49 -0.04 0.05
80% 1 89 3.08 0.37 3.00 3.09 0.37 2.0 3.75 1.75 -0.05 -0.51 0.04

Overall, chocolate bars received moderate ratings (M = 3.19, SD = 0.44). Bars with 70% cocoa content received the highest average ratings (M = 3.33), while 80% cocoa bars rated lowest (M = 3.08), suggesting a preference for moderate cocoa intensity in this sample.

Histogram

  1. Let’s look at a histogram to supplement our conclusions from the descriptive statistics and check if chocolate bar ratings are normally distributed.

Chocolate bar ratings are approximately normally distributed, with most ratings clustering between 3.0 and 3.5. The distribution is slightly left-skewed, indicating that very low ratings are relatively rare in this sample.


Box Plot

  1. Now let’s examine a box plot of chocolate bar ratings by cocoa percentage. What trends do you notice?

Bars with 70% cocoa content received the highest median ratings, while 80% cocoa bars were rated lowest. The spread of ratings is similar across groups, suggesting cocoa percentage has a modest effect on perceived quality.

Word Cloud

  1. What words are used most commonly used to describe our sample of chocolate?

The most frequently used descriptors center on richness and depth of flavor. “Sweet,” “cocoa,” “creamy,” and “nutty” dominate the cloud, suggesting raters most often experienced these bars as smooth and approachable. “Roasty,” “earthy,” and “molasses” also appear prominently, pointing to a darker, more complex flavor profile in a significant portion of the sample. More negative descriptors like “bitter,” “fatty,” and “sour” are present but smaller, indicating they were less universally experienced. Overall, the language skews positive, with flavor complexity and natural sweetness emerging as the defining characteristics of higher cocoa percentage bars.