A Swift look at Taylor’s music over the years

Louwrens
11 min readAug 2, 2019
New Taylor vs old Taylor.

Inspired by this Ted Talk by Colin Morris titled Pop Music is Stuck on Repeat I was interested in whether Taylor Swifts’ songs have become more repetitive over time. In the Ted talk, Morris talks through this essay where he outlines his strategy to quantify repetitiveness in songs using the Lempel-Ziv-Welch (LZW) compression algorithm.

To guide my investigation, I looked at answering the following three questions:

  • Have Taylor’s songs become shorter over time?
  • Have Taylor’s songs become happier over time?
  • Have Taylor’s songs become more repetitive over time?

You can find the Jupyter notebook detailing my full analysis here.

The dataset

I sourced all the Taylor Swift lyrics from this Kaggle dataset. The original dataset had each line in a song as a separate observation, but as I was more interested in the songs as a whole, I panel beat the dataset into the format shown below.

To make a fair comparison between the song lyrics, some data cleaning was needed. I decided to expand all contractions like “don’t” to “do not” and remove all quotes, braces and punctuation in the songs.

To get a feel for what Taylor’s discography is all about, I show a word cloud of the lyrics below. In the word cloud, the size of a word is related to the frequency it appears in the entire discography. The word cloud has been drawn after removing common words like “the”, “is”, “are”, “I”, also known as stopwords.

Wordcloud with stopwords removed for the entire Taylor Swift discography.

The dataset contains 94 songs from Taylor’s 6 studio albums. These albums are Taylor Swift, Fearless, Speak Now, Red, 1989 and Reputation released in 2006, 2008, 2010, 2012, 2014 and 2017 respectively. The number of songs per album over time is shown in the graph below.

Song length over time?

To answer the first question: “Have Taylor’s songs become shorter over time?”, we will be looking at two metrics: the number of unique words per song and the total number of words per song.

You can see from the graph below that the number of unique words per song has stayed relatively constant over the years, always hovering around between 100 and140. The bands here show the standard deviation in song lengths within each album.

Looking at the total number of words per song over time we see that as time progressed, there is an increase in the total number of words used in a song. There is a noticeable higher variance in the total number of words for songs in more recent albums. However, the lower standard deviation band from the 2017 album Reputation is higher than than the upper standard deviation band for Taylor’s self-titled 2006 album, clearly indicating an increase in the total number of words used per song.

Happiness over time?

Taylor has gotten a lot of critique in more recent years about her new bad-ass persona. But is this change in personality reflected in her song lyrics? To answer our second question: “Have Taylor’s songs become happier over time?” we turn to the most common text classification tool sentiment analysis.

Sentiment analysis works by looking at the words in a given piece of text and assigns a numeric value, usually between -1 and 1, called polarity to the text. The polarity describes how positive (1) or negative (-1) that piece of writing is and is calculated by looking at what words occur in the portion of text. A word like “hate” will decrease the polarity, whereas a word like “love” will increase the polarity. There are many sentiment analysis tools out there, each with their pro’s and con’s, but we’ve opted to use the TextBlob sentiment classifier from the textblob Python package for our analysis.

Below we show the sentiment polarity for each song plotted over time. The bands again show the standard deviation in polarity for all the songs within an album. From the plot, it looks like there is a slight decrease in positivity over the years, but nothing substantial.

However, most sentiment analysis algorithms, like the one used by Textblob perform better with more data. As songs are quite short, perhaps we can gain better insights into the sentiment polarity per album if we add all the lyrics for an album together. We can then look at the sentiment polarity for the entire album’s songs as a whole.

Below we plot the sentiment polarity for all the lyrics within an album, and we can see that indeed, Taylor has become a bit more negative over the years. Luckily she is still above the 0 sentiment line, indicate positivity on average.

Repetitiveness over time?

Finally, we get to the interesting question: “Have Taylor’s songs become more repetitive over time?” From personal experience, it feels that pop music has become a lot more repetitive in recent years. I was thoroughly intrigued by this Ted Talk by Colin Morris, which I highly recommend you give a watch, where he described a way to quantify this repetitiveness in songs.

You may not have heard of the Lempel-Ziv-Welch (LZW) algorithm, but you probably use it every day. It’s a lossless compression algorithm that powers gifs, pngs, and most archive formats (zip, gzip, rar).

To use the LZW algorithm to evaluate repetitiveness in songs, we following the following steps:

  1. save a song’s lyrics in plain text to your harddrive
  2. use the number of bytes occupied on the drive as a proxy for the length of the song
  3. use a general compression algorithm (LZW) to compress the song lyrics into a zipped file and save that to disk
  4. use the byte size on the hard drive of the compressed file to proxy for the amount of repetitiveness

Why does this work? This blog by Morris explains it in much more details, but the thumbnail explanation is as follows. Compression algorithms, especially lossless compression algorithms, work by exploiting repetitiveness. Imagine you’ve got a file that contains the following piece of text:

abcdef

To store this to disk, there is no shortcut other than to store all 6 characters, a b c d e and f. Now imagine you had the following piece of text:

aaaaaa

Could we not then just store 1 a and somehow indicate to the file that this should be repeated 6 times? Then we only need to store 1 character to disk along with a bit of meta-data to indicate to the decompression algorithm to expand the 1 a to 6 a‘s when the file is unzipped. This is exactly what the LZW algorithm does.

Can you see how this can proxy for song repetitiveness? If a song has a lot of similar lyrics, then the compressed lyrics stored to disk will be much less compared to the original file due to the compression algorithm exploiting the repeated lyrics.

I followed precisely this process and calculated the compression rate for each song. I’ve defined the compression rate as the percentage decrease in file size between the original, uncompressed file and the compressed lyrics file, mathematically:

Plotting the compression rate for all the songs over time, we see an interesting pattern emerging. The compression rate for all Taylor’s songs over time does indeed increase. In other words, her songs have indeed become more repetitive over time.

For interest sake, the most compressed song, with a compression rate of 81.6%, was Out of the woods from the 2014 album 1989. The lyrics are shown below, and you can see that the line “are we out of the woods yet” occurs a lot.

looking at it now it all seems so simple we were lying on your couch i remember you took a polaroid of us then discovered then discovered the rest of the world was black and white but we were in screaming color and i remember thinking… are we out of the woods yet are we out of the woods yet are we out of the woods yet are we out of the woods are we in the clear yet are we in the clear yet are we in the clear yet in the clear yet good are we out of the woods yet are we out of the woods yet are we out of the woods yet are we out of the woods are we in the clear yet are we in the clear yet are we in the clear yet in the clear yet good are we out of the woods looking at it now last december last december we were built to fall apart then fall back together back together your necklace hanging from my neck the night we could not quite forget when we decided we decided to move the furniture so we could dance baby like we stood a chance two paper airplanes flying flying flying and i remember thinking… are we out of the woods yet are we out of the woods yet are we out of the woods yet are we out of the woods are we in the clear yet are we in the clear yet are we in the clear yet in the clear yet good are we out of the woods yet are we out of the woods yet are we out of the woods yet are we out of the woods are we in the clear yet are we in the clear yet are we in the clear yet in the clear yet good are we out of the woods remember when you hit the brakes too soon twenty stitches in a hospital room when you started crying baby i did too when the sun came up i was looking at you remember when you could not take the heat i walked out and said i am setting you free but the monsters turned out to be just trees when the sun came up you were looking at me you were looking at me ooh you were looking at me are we out of the woods yet are we out of the woods yet are we out of the woods yet are we out of the woods i remember are we in the clear yet are we in the clear yet are we in the clear yet in the clear yet good oh i remember are we out of the woods yet are we out of the woods yet are we out of the woods yet are we out of the woods are we in the clear yet are we in the clear yet are we in the clear yet in the clear yet good are we out of the woods yet are we out of the woods yet are we out of the woods yet are we out of the woods are we in the clear yet are we in the clear yet are we in the clear yet in the clear yet good are we out of the woods yet are we out of the woods yet are we out of the woods yet are we out of the woods are we in the clear yet are we in the clear yet are we in the clear yet in the clear yet good are we out of the woods yet are we out of the woods yet are we out of the woods yet are we out of the woods are we in the clear yet are we in the clear yet are we in the clear yet in the clear yet good

The least compressible song, with a compression rate of 55.6%, was The best day from the 2008 album Fearless, which has lyrics:

i am five years old it is getting cold i have got my big coat on i hear your laugh and look up smiling at you i run and run past the pumpkin patch and the tractor rides look now the sky is gold i hug your legs and fall asleep on the way home i do not know why all the trees change in the fall but i know you are not scared of anything at all do not know if snow whites house is near or far away but i know i had the best day with you today i am thirteen now and do not know how my friends could be so mean i come home crying and you hold me tight and grab the keys and we drive and drive until we found a town far enough away and we talk and window shop until i have forgotten all their names i do not know who i am going to talk to now at school but i know i am laughing on the car ride home with you do not know how long it is going to take to feel okay but i know i had the best day with you today i have an excellent father his strength is making me stronger god smiles on my little brother inside and out he is better than i am i grew up in a pretty house and i had space to run and i had the best days with you there is a video i found from back when i was three you set up a paint set in the kitchen and you are talking to me it is the age of princesses and pirate ships and the seven dwarfs and daddy is smart and you are the prettiest lady in the whole wide world and now i know why all the trees change in the fall i know you were on my side even when i was wrong and i love you for giving me your eyes for staying back and watching me shine and i did not know if you knew so i am taking this chance to say that i had the best day with you today

Conclusion

In this post, we compared new Taylor to old Taylor and answered three questions:

  • Have Taylor’s songs become shorter over time? No, they appear to have a greater number of total words, but with the same amount of unique words.
  • Have Taylor’s songs become happier over time? No, they appear to have gotten a bit more gloomy recently.
  • Have Taylor’s songs become more repetitive over time? Yes, if we use our compression rate metric as a proxy for repetitiveness, then indeed her songs have become more repetitive.

What does this say about the quality of Taylor’s music and pop music in general? Nothing, really. Music taste is very subjective and depending on who you are, and where you are in life, you might fancy new Taylor more than old Taylor. Or, you might have disliked her through all of her albums over the years.

Either way, it does appear that compression algorithms have got more use in life than just compressing files to save hard drive space.

--

--