Taylor Swift Data

Click this hyperlink to view the full data dictionary for the Taylor Swift data set. For our convenience, here are the variables that are most important for this lab activity:

danceability (\(y\)): Spotify danceability score, ranges from \(0\) (least danceable) to \(100\) (most danceable)
valence (\(x_1\)): Spotify valence score, ranges from \(0\) (low valence, sounds more negative… sad, depressed, angry) to \(100\) (high valence, sounds more positive… happy, cheerful, euphoric)
tempo (\(x_2\)): Overall estimated track tempo (i.e., speed or pace of a given piece and derives directly from the average beat duration) in beats per minute (BPM)
era (\(x_3\)): Era of the album release, denoted with \(3\) levels: early (\(2006-2012\)), mid (\(2013-2019\)), and late (\(2020-2022\))

First, it’s important we run some preliminary analysis to better understand our data. Interpret the appropriate descriptive statistics (e.g., \(M\), \(SD\), \(skewness\), \(kurtosis\)) for danceability, valence, and tempo. What do you notice about these descriptive statistics across eras?

## Danceability, valence, and tempo descriptives:

##              vars   n   mean    sd median trimmed   mad   min    max  range
## danceability    1 263  58.66 11.04  59.80   58.75  9.79 29.20  89.70  60.50
## valence         2 263  40.46 19.55  39.90   39.69 21.05  3.82  94.20  90.38
## tempo           3 263 125.15 31.23 122.88  123.96 34.08 57.96 208.92 150.95
##               skew kurtosis   se
## danceability -0.11     0.04 0.68
## valence       0.33    -0.49 1.21
## tempo         0.37    -0.43 1.93

## Danceability, valence, and tempo descriptives by era:

## 
##  Descriptive statistics by group 
## era: early
##              vars  n   mean    sd median trimmed   mad   min    max  range skew
## danceability    1 87  58.16  9.59  59.20   58.11  9.64 37.10  84.30  47.20 0.10
## valence         2 87  41.57 20.81  37.90   40.56 22.39  4.83  92.80  87.97 0.40
## tempo           3 87 123.33 29.85 124.91  122.34 30.96 57.96 204.12 146.16 0.36
##              kurtosis   se
## danceability    -0.16 1.03
## valence         -0.82 2.23
## tempo            0.01 3.20
## ------------------------------------------------------------ 
## era: mid
##              vars  n   mean    sd median trimmed   mad   min    max  range
## danceability    1 58  63.57 12.65  64.65   64.24 13.12 29.20  89.70  60.50
## valence         2 58  41.63 19.31  42.15   41.11 20.02  4.99  94.20  89.21
## tempo           3 58 129.94 35.77 123.15  129.29 44.00 68.53 207.48 138.94
##               skew kurtosis   se
## danceability -0.48    -0.11 1.66
## valence       0.30    -0.22 2.54
## tempo         0.19    -1.07 4.70
## ------------------------------------------------------------ 
## era: late
##              vars   n   mean    sd median trimmed   mad   min    max  range
## danceability    1 118  56.60 10.54  58.20   57.03  9.86 31.00  87.00  56.00
## valence         2 118  39.06 18.76  39.75   38.48 20.31  3.82  92.00  88.18
## tempo           3 118 124.14 29.86 121.54  122.84 33.27 73.94 208.92 134.98
##               skew kurtosis   se
## danceability -0.29     0.15 0.97
## valence       0.24    -0.52 1.73
## tempo         0.41    -0.42 2.75

Let’s now look at the distribution of our continuous variables and see if they’re normally distributed.

Create box plots of our continuous variables (danceability, valence, tempo) to visually summarize the distribution of data, showing its center (median), spread (variability), and skewness at a glance. Do you notice any potential outliers? Here’s how to interpret box plots if you need a refresher.

Box plots are especially useful for comparing distributions across multiple groups. Graph danceability, valence, and tempo by era. What do you notice?

Let’s graph some scatter plots and examine the relationships between our continuous variables. We’ll also examine scatter plots by looking at era. What trends can we describe in terms of direction (i.e., positive, negative, or none), form (i.e., linear, curved, or no pattern), and strength (i.e., weak, moderate, or strong)?

Let’s create a correlation table of danceability, valence, and tempo to supplement the conclusions that we observed in our scatter plots.

## 
## 
## Table 1 
## 
## Means, standard deviations, and correlations with confidence intervals
##  
## 
##   Variable        M      SD    1            2          
##   1. danceability 58.66  11.04                         
##                                                        
##   2. valence      40.46  19.55 .32**                   
##                                [.21, .43]              
##                                                        
##   3. tempo        125.15 31.23 -.31**       -.01       
##                                [-.42, -.20] [-.14, .11]
##                                                        
## 
## Note. M and SD are used to represent mean and standard deviation, respectively.
## Values in square brackets indicate the 95% confidence interval.
## The confidence interval is a plausible range of population correlations 
## that could have caused the sample correlation (Cumming, 2014).
##  * indicates p < .05. ** indicates p < .01.
##

Cocoa Data

We have some other data this week which comes from Flavors of Cacao, which contains about chocolate! Let’s explore our data.

We are interested if cocoa percentage (\(65\)%, \(70\)%, \(75\)%, \(80\)%) impacts chocolate bar ratings. Cocoa percentage refers to sum of the cocoa bean and any added cocoa butter content; a higher percentage (e.g., \(80\)%) indicates less cocoa butter (e.g., \(20\)%). Our sample \(N\) = 356.

It’s important we run some preliminary analysis to better understand our data. Interpret the appropriate descriptive statistics (e.g., \(M\), \(SD\), \(skewness\), \(kurtosis\)) for chocolate bar ratings across cocoa percentage groups.

## Descriptives of chocolate bar rating:

##    vars   n mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 356 3.19 0.44   3.25     3.2 0.37 1.5   4   2.5 -0.39      0.2 0.02

## Descriptives of chocolate bar rating as a product of cocoa percentage:

## 
##  Descriptive statistics by group 
## cocoa_percent: 65%
##        vars  n mean   sd median trimmed  mad min max range  skew kurtosis   se
## rating    1 89 3.17 0.43   3.25     3.2 0.37 1.5   4   2.5 -1.03     1.61 0.05
## ------------------------------------------------------------ 
## cocoa_percent: 70%
##        vars  n mean   sd median trimmed  mad min max range  skew kurtosis   se
## rating    1 89 3.33 0.46   3.25    3.34 0.37   2   4     2 -0.27    -0.58 0.05
## ------------------------------------------------------------ 
## cocoa_percent: 75%
##        vars  n mean   sd median trimmed  mad min max range  skew kurtosis   se
## rating    1 89 3.17 0.47   3.25     3.2 0.37   2   4     2 -0.49    -0.04 0.05
## ------------------------------------------------------------ 
## cocoa_percent: 80%
##        vars  n mean   sd median trimmed  mad min  max range  skew kurtosis   se
## rating    1 89 3.08 0.37      3    3.09 0.37   2 3.75  1.75 -0.05    -0.51 0.04

Let’s look at a histogram to supplement our conclusions from the descriptive statistics and check if chocolate bar ratings are normally distributed.

Now let’s examine a box plot of chocolate bar ratings by cocoa percentage. What trends do you notice?

What words are used most commonly used to describe our sample of chocolate?

Data Processing, Descriptives, Visualization, and More in R/RStudio

Created by Kayla Sansevere for PSY207: Advanced Statistics I Lab

Friday, October 17, 2025

Taylor Swift Data

Cocoa Data