mapping.canonical_musicbrainz_data is the entirety of the MusicBrainz data whittled down to only canonical recordings (the recordings that people think of as being "the recording" for a given track). Given that MB tracks multiple versions of everything this step is critical to build a mapper.
Funkwhale doesn't do that, it only has metadata about its tracks, so the funkwhale metadata can be used directly in its place. And yes, a metadata index where we don't actually use MBIDs at all. No need for collections to be tagged. No need to modify files.
I'm not sure we should use Typesense. We could build an index with django.
Que? Django is a web framework and Typesense is text search engine. These are not interchangeable.
Typesense is the right tool for this job. Typo tolerant fast text search is exactly what is needed in this case right here. We can tune how many typos are allowed in a search (based on the size of the index -- larger index, longer search times. Typesense is the right tool here and I will develop this project using it. Should FW have issues with installing typesense, then you and the rest of the community can find some worse performing index to install in its place or to reinvent the wheel as you see fit. Just keep me out of these politics.
You want to import troi in funkwhale, then funkwhale call troi.patch(local_patch), and the troi lib can handle local_patch at the end of the pipeline ?
I looked at this in detail yesterday and I'll add a command line --post-process that can specify another patch to run as a post processing patch. In this case that would be the content resolver patch that makes calls to typesense. Using the post-process flag allows the main troi patches to run unmodified and have the content resolver tacked into the pipeline at the end, ensuring that only local tracks are recommended.