Thoughts on Musical AI

The real reason I chose this topic is that I wanted to draw robots playing piano.

When I was in a composition class, I used a music notation program that could play back what I had notated. It was a pretty nifty program, but the music sounded awful. Always. I blame this program for stamping out my dream of being the next Hanz Zimmer. However, occasionally the piece didn’t sound half bad when I took it to the piano. No Hanz Zimmer of course, but maybe something that someone might listen to.

What was so different about my muddling performance that made it sound more ‘musical’ than the rendition by the program? What made the program sound inexpressive?

First of all, it is helpful to look at a definition of “expressivity” in music:

Expressivity in music refers to the production and perception of variation in musical parameters. Music is inexpressive when it is uniform and mechanical, whereas music described as expressive communicates through dynamic fluctuations of acoustic and visual information. (Music in the Social and Behavioral Sciences: An Encyclopedia, 2014).

Perhaps the program sounded unexpressive because it played back the notes in a ‘uniform and mechanical’ manner. But could a computer program ever perform music expressively?

Regardless of whether it’s possible for a program to perform expressively, what would be the point of such a program? Perhaps it would finally eliminate the need for musicians. No more endless practicing or endless guilt over not practicing. No need to pay for rehearsal time, concert halls, conductors, or big time soloists. And finally we could be rid of those pesky instruments, forever going out of tune, breaking strings, losing reeds, missing keys.

Image result for calvin and hobbes 1812 overture
An automatic music rendering program would also help ensure the safety of the audience.

Needless to say, I doubt that such a program would stop people from wanting to make music. After all, we have more music at our fingertips today than at any time in human history, yet people continue to labour at learning an instrument, pay to attend live concerts, and enjoy out-of-tune singing around a campfire.

On a more interesting level, a computer program that could fool people into thinking it was an expressive human performer might teach us something about what constitutes musical expression. But it is important to consider whether we can make the claim that such a computer program could be seen as a cognitive model of human music performance.

If the point of a music rendering program is to render music that sounds convincingly human, then we could say that the program is successful when it can fool us into thinking its performance was done by a human. If, however, the point of creating such a program is create a cognitive model of human performance, the standard for success is much more complicated. In fact, some would say that success is not possible. This is what I’m going to focus on in this post, that is, the question, Can a music performance program that sounds convincingly human can be said to model human performance experiences?

I’m taking advantage of the fact that this blog post assignment is open-ended and ungraded to take a break from scientific writing and dabble in some philosophical musings. I do not pretend that my two undergraduate philosophy courses have given me the ability to make an informed philosophical argument; I like to think of this as more of a sounding board for ideas.

In what is perhaps an alarmingly subjective way to look at this topic, I will begin by sharing with you one of my own performance experiences.

A few years ago, I decided to get a performance diploma in addition to my music degree. It was the only way I could have a graduating recital and I had some peculiar idea that having a graduating recital was the best way to finish off a music degree. This recital had to be about forty minutes long, and all the music had to be played from memory.

To be honest, I can’t remember what I was thinking during most of the recital. Maybe I was ‘lost in the music’. But I do remember a few moments vividly – the moments when I made a big mistake and had to decide what to do about it. One particularly poignant moment was when I messed up my favorite part of Soirée dans Grenaude by Claude Debussy. Here is what I was thinking, as well as I can remember:

Shit. Where does my left hand go? Shit shit shit. No, don’t think about it. Let your hands remember. I never mess up this part. Maybe I’ll just play it again. But my teacher and the professor are looking at the score. Are they looking at the score? My teacher will know every mistake, with that perfect pitch of his. But this professor isn’t very strict. I think she’d rather I play it again expressively than be accurate. Thank goodness it isn’t that other professor judging me. And my teacher said I’m an artist tonight, not a student, maybe he won’t mind. Most people here have no idea what the score says. Am I playing for them, or for a grade? But maybe I should just keep going. I tried to go back to fix a mistake in the Beethoven, and I made the mistake again the second time. But I’ve wanted to play this piece for years. What was the point if I mess up the best part?

But where do I go back? I can picture where I am on the page of the score. What key am I in? I’m playing a lot of black keys. But are they sharps or flats? Why can I never just memorize what key a piece is in? One of my old piano teachers always made me memorize key signatures. I cried so many times in lessons with her. I hope I’m a better teacher. Gah, some of my students are here. I always tell them to keep going if they make a mistake.

I’ll go back. Should I do a cadence? The cadence didn’t work so well in the Beethoven. It’s probably more Debussy-like to just fade in and out of sections anyway. Shit, how does the section begin? Right, right. Don’t think. Stop thinking. Here we go.

Obviously I don’t remember the exact words of what I was thinking (and was I thinking in words, anyway?). But I do remember considering all these things while trying to decide to play the section again, all the while still playing.

Music rendering programs today might be able to output a performance that sounds better than mine did, but programs today aren’t complicated enough to model everything that contributed to my performance decisions. That being said, one might point out that this doesn’t necessarily mean that such a complicated model couldn’t be developed in the future.

What would a program that realistically models performance expression look like? It might be helpful to think about a performance in this way:

W expresses X by doing Y in context Z”, where
          W is the individual performer
          X is the object of expression
          Y is how the expression is accomplished
          Z is the context

I hope that the description above of my thought process during performance shows that each of these general variables is involved in a performance and that there are many considerations within each variable. Some would say that a purely cognitivist or computational perspective falls short in modelling the experience of performing music, and would argue for a perspective that acknowledges the situatedness and embodiment of cognition. Indeed, it seems as though the embodied music cognition framework is gaining recognition in the field of music psychology.

As far as general cognition goes, there are highly intelligent and informed people with opposing perspectives on whether we can computationally model human experience. Most people might just say it’s not worth losing sleep over (I did actually lose some sleep over this – I was writing an essay on this topic and procrastinated too much to get it done before bed time). But it is an interesting question, and perhaps an important topic to consider when doing research on human experience.



  1. Nice entry Kendra! I doubt an AI will ever be able to mimic the processing that takes place when one performs piano—this seems particularly true for your case : )

    This made me think of whether anyone was developing a Music Turing test, and not surprisingly found some interesting projects for music and art.

    If the embodied approach to cognition is true, the role of the body within human thought processes is probably one of the biggest barriers to achieving AI. Here’s another post I read recently:


    1. Thanks for the comments, Jo! I actually did some research of what academics think of a ‘musical Turing Test’, and I found one article that was particularly insightful. Here’s the citation, if you’re interested:

      Ariza, C. (2009). The interrogator as critic: The turing test and the evaluation of generative music systems. Computer Music Journal, 33(2), 48-70.

      I decided not to go into detail about the Turing Test in my post in order to keep things simple, but it is quite fascinating! It is also quite interesting to consider a ‘musical Chinese Room’. The Chinese Room argument (Searle, 1980) is one of the most well-known arguments against the validity of the Turing Test. With this argument, John Searle sought to refute what he refers to as “strong artificial intelligence”, that is, the idea that a computer that is programmed properly and has the right inputs and outputs could be said to have a mind just as we say that humans have minds.

      He describes a thought experiment that resembles a Turing Test, except that the unseen machine is replaced with an unseen human who possesses a book of rules that outlines how to answer any question or statement Chinese language. The interrogator sends questions or statements to the unseen human, and the unseen human takes this input and uses the rule book to formulate an appropriate output to send back to the interrogator. The interrogator can be convinced that the unseen human understands Chinese because the unseen human answers like a native Chinese speaker, however, the unseen human does not in fact have any knowledge of Chinese – they only know how to use the book of rules. The point of this thought experiment to show that a computer can never have any notion of semantics, that is the meaning of words, and so they cannot be said to have minds in the sense that humans have minds.

      Like I said, it is interesting to try to picture a ‘musical Chinese Room’. In this situation, perhaps a human would insert a musical score into a room, they would hear a performance of the music coming from the room, and they would consider whether the performer is a machine or a human. In the room would be a human with a book of rules as to how to render music expressively from a score, including to how to use the musical instrument. This book could be extremely detailed, and include rules about every element involved in musical expression, as in the expression “W expresses X by doing Y in context Z” that I described in my blog post. One must imagine that the human is remarkably fast and adept at using the rules.

      If the human outside the room is convinced that the performer is human, then can we say that the machine is being musically expressive? Searle would argue that the human in the room is not being expressive in the way that a human is expressive because, despite being remarkably fast and adept at using the rules in the book, the performer does not understand musical semantics – musical meaning. He would say that the performer cannot form a ‘conception of the music’, a performance intention. Perhaps one must actually be able to think to have such a conception of music and a performance intention in the way that a human performer does.

      On the other hand, one might come back to age-old philosophical idea of ‘problem of other minds’ (ex, Waterman, 1995). This is one of the main supporting ideas behind the Turing Test; it proposes that we cannot know the contents of other humans’ minds, let alone a computer’s, because we cannot directly experience their minds. From this view, we cannot say that a the mind of a human and the ‘mind’ of a computer functions in essentially different manners if the external behaviours are indistinguishable.

      It’s easy to go around in circles when thinking about all this, I know I do!

      Searle, J. R. (1980). Minds, brains, and programs. Behavioral and brain sciences, 3(03), 417-424.

      Waterman, C. M. (1995). The Turing Test and the Argument from Analogy for Other Minds. Southwest Philosophy Review, 11(1), 15-22.


  2. Interesting topic, Kendra. I recently read something about how music recordings can capture the bodily experience of the performer, which made me think back to your post. Eric Clarke writes that the musician’s bodily interaction with the instrument is captured in the sound of recordings and contributes to the perception of emotion. I’m not entirely sure what he means by this, but I think he’s suggesting that there is some kind of intangible element of bodily communication through music, even when the body can’t be seen. In another paper, he writes about performers like Glenn Gould who make the decision to keep their bodily sounds in recordings, presumably to add some expressive element, and about how classical recordings tend to aim for a kind of disembodied aesthetic, and how close microphone techniques emphasise the “audibility of the body”. I’d agree that the aspect of embodiment is an important one for this topic, I thought it was interesting to consider that it may also be important to the actual perception of emotion in music. If anyone else knows something about this, I’d be interested to hear it!


    1. This idea is interesting. I’ve been thinking about embodied cognition and musical ability in AI and it seems to me that you could program in “a body” that impacts cognition and output of some complex network. In other words, there isn’t need for an actual body as we think we have them but rather just a need for a “body filter” that acts to influence cognition in an “embodied” way. But I hadn’t considered the performance aspect that you’re mentioning, Emma, and how a visually-bodiless entity would not be able to achieve that same connection with a listener. It seems like a definite limitation. Something like VR projections of an entity though might be able to facilitate connection to the listener and thereby circumvent this problem. I guess we’ll find out soon…


  3. Great read, Kendra! I think this is an interesting topic, and have spent many hours of heated discussion talking about if AI could replace musicians or if there was something special that a robot could never do (though I am also of the mind that recreating the human musician and emotional expressivity is impossible, partly because it took billions of years to get to human life, and I don’t know if humans are capable of recreating something similar in the blink of time that we’ve been here). That being said, even if we were able to create a programme that could play music so similar to a human that it could fool a human listener, I still believe that people would want to go to concerts and learn instruments. As emmaallingham said, there is a physical aspect to how we express the music. We use our bodies to express the music.

    On a different note, I read a paper that talked about the creation of interactive art/media projects, one of which was a “neurobaby”. The neurobaby moves and coordinates its actions and sounds to your voice. The author, Gill (2009), said that she found that the image of the baby was so real that she began becoming distressed and anxious when the neurobaby started to cry. Although I think that musical expression is a little more difficult than mimicking the crying of a baby, it may be that in order to reach the level of expressivity found in humans, we would need to create robot that looks very humanesque–such as Data from Star Trek: The Next Generation.


  4. Great read, Kendra! This reminded me of an article I read in the National Post a few weeks back. A computer program in Spain wrote a piece in eight minutes, which was performed by the LSO:

    This piece got poor reviews from audiences but great reviews from critics. That isn’t saying much though. Many pieces throughout history had poor reviews when they were first performed, and then became popular later on.

    AI and the arts has been quite a popular topic of conversation, at least for me, lately. We seem to think that the arts are not at risk of being taken over by AI, but that doesn’t seem to be the case. When it comes to embodiment, a computer program can’t feel the emotion of a piece it writes, nor does it take inspiration from it’s past experiences, like human composers might. Maybe they programmers have coded a way to make it aware of what is physically playable on certain instruments, but knowing the limits of the body isn’t the same. Granted, most composers, at least those that compose for orchestra, aren’t capable of playing every instrument in the orchestra, but their real life experience with music and at least some kind of instrument is enough. I think about the horrible performance anxiety I’ve been known to get, but I’d rather have a little bit of anxiety than play like a robot. Now, I say “like a robot”, but robots are getting to be more and more humanlike in their ways, as the article I posted clearly demonstrates. So maybe playing like a robot isn’t going to be such a criticism in the coming decades.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s