Sporiff Good work, I like this spec.
Two notes about the section ""parse_credit strategy"":
"Add a space to both sides of the joinphrase for readability
On MusicBrainz join phrases would contain needed whitespace around them. Not always is a join phrase surrounded by whitespace. It's true for e.g. " feat. "
and " & "
, but already breaks for ", "
as e.g. in "Tommy J., Robin Devil & Sammy Burns"
. But the ", "
also is a problematic join phrase, as it might show up as part of artist name as well (e.g. if performer is credited with last name first).
Also an alternative thought on the possible implementation: The current documented approach tries to split the artist credit string based on a list of known join phrases.
In some cases there might be a list of artists. E.g. Picard does by default store a list of all credited artists in the artists
[1] tag. There is also the musicbrainz_artistid
[2] that would hold multiple MBIDs if multiple artists are credited, and it could be used to build a list of artist names.
If this artist list is present, it could for one directly be used to associate the track with different artists. It also could be used to detect artists in the artist credit string and split it accordingly into a list of artist / join phrase pairs.
I have some working code to do this split at [3].
Taking the example from the spec of "Tommy J. feat. Robin Devil & Sammy Burns"
const artists = [
{ name: "Tommy J." },
{ name: "Robin Devil" },
{ name: "Sammy Burns" },
]
const credits = splitArtistCredits("Tommy J. feat. Robin Devil & Sammy Burns", artists)
// Result:
// [
// {artist_credit_name: "Tommy J.", join_phrase: " feat. "},
// {artist_credit_name: "Robin Devil", join_phrase: " & "},
// {artist_credit_name: "Sammy Burns", join_phrase:""},
// ]
The nice thing is that this will collect everything undetected into the join phrases. If you put convert the credit list back into the a credit string it will always give back the original. E.g. if you miss "Robin Devil" in the artist name list it would still detect the other two artists and include "Robin Devil" in the join phrase:
[
{ artist_credit_name: "Tommy J.", join_phrase: " feat. Robin Devil & " },
{ artist_credit_name: "Sammy Burns", join_phrase: "" },
]
You can also give it name variations of the artists.
Maybe the join phrase detection and artist name detection could be combined, or used according to what gives the better result.
[1] https://picard-docs.musicbrainz.org/en/appendices/tag_mapping.html#artists
[2] https://picard-docs.musicbrainz.org/en/appendices/tag_mapping.html#id17
[3] https://git.sr.ht/~phw/discourse-listenbrainz/tree/3ac7b66ae6ec03f04503390ee994b5deb94248aa/item/assets/javascripts/discourse/helpers/listenbrainz-split-artist-credits.js