Finding overlapping MIDI notes

I’ve been using Beeano Midi Player to visualise MIDI tracks on a piano roll for inclusion in piano tutorial videos. As the software doesn’t support the sustain pedal for indicating the duration of a held note, I’ve been extending a note in the MIDI track to occupy the entire held duration, and at the same time, using quantisation. Beeano is a bit temperamental processing overly quantised MIDI tracks, where two held notes of the same pitch that are snug against each other are considered overlapping. Beeano plays the first note, but cuts out for the second overlapping note in the visualisation.

MIDI doesn’t have the concept of a note duration; instead a held note is represented by a MIDI on event at a pitch and a MIDI off event at the same pitch to represent a held note. In addition to the MIDI off event, the MIDI spec also indicates that a zero velocity MIDI note on can also represent the end of a note.

A prevention method for these overlapping notes is to not overly quantise notes such that note ends finish at the same time as the next note (at the same pitch) starts. However, if that is required, instead of checking each note, a quick remediation method is to select all notes in the track using a sequencer, and decrease the length a few ticks.

I found this last step helped immensely with my tracks, but I was still finding the occasional overlapping note when running through Beeano. This could be due to the recording device, but I’m not sure.

To detect these overlapping notes, I wrote a quick Python script that would report on these overlaps. This helps narrow the search down trememdously.

Spatial Analysis with Python

I’ve recently had to undertake spatial data analysis and one of the more challenging tasks was to identify repeat vs unique vehicle journeys. While QGIS is fantastic for visualising routes through heatmaps and density plots, one of my requirements was to provide quantitative metrics around the proportion of repeat vs unique journeys.

To achieve this, I used several open-source Python tools:

  • Fiona for importing the journey data stored as Shapefiles, represented as LineStrings
  • Rtree for quickly identifying potentially relevant journeys through a geospatial index (to avoid calculating unnecessary and costly intersection operations)
  • Shapely for calculating the intersection/difference between each journey and historical journeys

It took me several variations before I discovered an optimal solution, dropping the run time from 1+ day to under an hour. The lessons learned included:

  • Preprocessing data to eliminate TopologyException: found non-noded intersection between LINESTRING
  • Using an R-Tree to quickly eliminate geometry that is not overlapping the bounds of the geometry of interest
  • Subtracting historical journeys (calculating the difference) from the current journey and determining the uniqueness, rather than finding the intersection of the current journey with the (cascaded) union of historical journeys to find “repeatness”
  • Adding a tiny buffer to dramatically increase difference calculating speed (the LineString/Polygon difference operation is faster than LineString/LineString operation – at least for complex LineStrings)

Below is some representative code that illustrates the above methodology:

import fiona
from shapely.geometry import mapping, shape
from rtree import index

INPUT = "journeys.shp"
BUFFER_SIZE = 1 # Should be appropriate for the coordinate reference system of the INPUT file

records = list(

# Sort by date to identify "historical" routes
records.sort(key=lambda record: record["properties"]["date"])

# Create indexes
rtree_idx = index.Index()
buffered_shape_cache = {}
for pos, record in enumerate(records):
    shp = shape(record["geometry"])
    rtree_idx.insert(pos, shp.bounds)
    buffered_shape_cache[record["properties"]["id"]] = shp.buffer(BUFFER_SIZE)

for pos1, record in enumerate(records):
    current_geom = shape(record["geometry"])
    current_geom_length = current_geom.length

    # Subtract all historical journeys
    for pos2 in rtree_idx.intersection(current_geom.bounds):
        # Only evaluate against historical journeys
        if pos2  pos1:
            historical_geom = buffered_shape_cache[records[pos2]["properties"]["id"]]
            current_geom = current_geom.difference(historical_geom)
        if current_geom.length == 0:

    remaining_length = current_geom.length
    print("id: {0}, length: {1}, remaining: {2}, proportion: {3}\n".format(record["properties"]["id"], current_geom_length, current_geom.length, current_geom.length / current_geom_length))

If the entire data set is too large to load into memory, it can be partitioned (e.g. geometrically halved or quartered) and the results safely combined (based on id) to give an accurate proportion.