How Far Can a Score Become Music?
Between Generation and Judgment — What Makes Music (Part 5 Final Part)

Series
Part 1 | Part 2 | Part3 | Part 4 | Part 5 (current article, final)

The Current Limits of Generative Systems

As we have seen, NotePerformer [1] provides an excellent system for reading a score and reconstructing it into stable musical output. Melisma [2], on the other hand, takes a step further, attempting to enter the domain of how music itself comes into being.

[1]: NotePerformer
[2]: Melisma

However, this process also reveals the limits of generation.

The modeling of temporal flow and variation in monophonic music has already reached a high level.
However, the situation changes significantly when we move to music in which multiple voices interact simultaneously, such as choral writing or polyphonic instrumental textures.
On the other hand, when it comes to music in which multiple voices are involved simultaneously, such as choral or polyphonic instrumental music, the situation is very different.

When listening to Melisma’s choral textures, it does not sound like a simple layering of identical lines.
Rather, there are subtle differences in timing and pitch, which seem to create a unified yet slightly fluctuating whole.

This characteristic may be related to generative processes such as diffusion models, where exact uniformity is not assumed.

In such polyphonic contexts, the balance, independence, and constantly shifting relationships between individual voices make it difficult to establish musical coherence through simple rules or patterns alone.

Therefore, something is ultimately required to determine whether the result truly functions as music. At present, that role is still entrusted to human hearing—in other words, the ear.

Generative AI can create music based on past data, but its ability to judge whether that music is truly natural or convincing remains limited.

In other words,
You may be able to "make it," but it may still be difficult to "judge it to be good."

Returning to Sequencing

Seen in this light, the fact that I continue to work with manual sequencing may not be accidental, but rather a natural outcome.

Sequencing is not simply about placing notes. It inherently involves a continuous interplay between generation and judgment:

  • selecting sounds
  • balancing voices
  • shaping temporal nuance
  • correcting subtle inconsistencies

In this sense, sequencing is not just a method of creating music, but the very process through which music comes into being.

One might even say that humans, often unconsciously, rely on a faculty of evaluation that AI has not yet achieved.

Final Thoughts

Through this series of reflections, what becomes clear is that music is not merely data or structure.
It is something in which relationships evolve over time, and its realization inevitably involves an element of judgment.

Generative AI certainly points toward new possibilities.
At the same time, it also brings into sharper focus the role of the human in music.

And at least for now,
the place where music truly becomes music remains within the process of listening, thinking, and refining— a process carried out by human hands and ears.


Japanese version is available here.