Philosophy/Best Practice/Future of Connect Tag "inheritance"

mfoos · March 4, 2019, 1:23pm

Hi folks,

I just set up a quickstart to experiment with how tagging works so I can be well-informed when I sit down with stakeholder to discuss tags, and I've noticed that duplicate tags are allowed in different "branches" but that they don't/can't "multiply inherit". I can mostly appreciate this design choice, since if you want to have tags like "Car -> blue" and "Political leaning -> blue" you probably just want the tag to say blue, and not be shared (and it's obviously de-duped under the hood), but in my situation, if we have, for instance, a protein that might be informative in more than one disease, it would be nice to have "disease1 -> proteinA" and "disease2 -> proteinA" where proteinA tags the same apps.

I am not a tagging expert, and I will find a solution that works for us, but I was just wondering if a) this is informed by a best practice you could point me to, and b) whether this is something that might change in the future.

thanks!

cole · March 6, 2019, 7:16pm

Hey @mfoos ! Thanks for your post! We have circulated your post internally and certainly appreciate the use case! Is there a bit more clarity that you could provide by way of example?

The way I like to think about tag hierarchies is as sets of attributes. In your example, the "protein" is an attribute and the "disease" is an attribute. The hierarchy is for the attributes (i.e. diseases might be categorized, or proteins might be categorized, but proteins are not really categorized by disease), and not for the apps (i.e. not like a folder structure). But I am far from a tagging expert either

For instance, the approach that I think that would work best today (given your situation) is two tag hierarchies:

Protien

Protein A
Protein B
Protein C

Disease

Disease 1
Disease 2
Disease 3

Then a given app could have, e.g. Disease 1 and Protein A, while a separate app has Disease 2 and Protein A. Filtering to Protein A would then grab both applications.

What seems to be the missing piece is I suspect you are trying to filter which tag combinations are valid... is that right? I.e. maybe Protein B and Disease 1 makes no sense, so you were looking for the hierarchy to enforce that restriction?

Given the current state of affairs, I think the approach I define above would be the desirable one, and then the "enforcement of valid combinations" would be something done (1) manually, or (2) using the Connect Server API (once we have added functionality around tags... which unfortunately does not exist today ). I.e. you could imagine a scheduled report that audits or enforces which tag combinations are in use.

If you can provide a bit more clarity / a concrete example of how the configuration above would break down, I think it would be helpful in order for our developers to evaluate / scope such a feature!

mfoos · March 7, 2019, 6:27pm

Cool! As I was submitting the original post, I was like "oh this is probably one of those don't-mix-the-types things" which you have also captured here in the "attributes" mental model. So I do think that's the "workaround" I was/am headed toward.

However, I know that your customer going "yeah, okay, I get it" is not as satisfying as actually improving their situation/answering their question, so I will flesh out my example a little. First, this doesn't invalidate your point, but for clarity, to us, proteins ARE categorized by disease. Like, in some instances, a brain-related protein would be of interest for Alzheimers AND Parkinsons, but an immune system protein that we are looking at for MS wouldn't be of interest for, say, stroke (don't @ me, neuroscientists!). So that protein would be "an MS protein" for our purposes. But anyway!

So in my mind, there's a little bit of pressure to not have a million categories, so in a situation like we have, which will have very diverse tags, my inclination is to sort of "nest" right away. Then I have a theoretical user in my brain coming to the tags and going "How will I find my content?" and while it's true that some users will be like "gimme that protein!", many will be like "I want to know about my disease of interest!" and then when that menu opens, it doesn't make sense to stay strict to the attribute, because the interest is not going to be in, say, subtypes of Alzheimers (some day, hopefully!), it's going to be in projects (potential drugs that need questions answered about them), targets (what we've been calling "proteins"), clinical trials, etc.

So no, the problem isn't really "validity" of tag combos, it's having overlapping hierarchies, and I suppose supporting different use cases - people who would "start" at the disease level vs people who would "start" at the protein level and making sure they see the relevant subtags. I can currently have "AD -> target 1" and "PD -> target 1" tags at the same time, but I do have to manually add both of them, and everything I know about databases tells me not to do that. And when I do it, the display is sorta weird looking.
58%20PM

Anyway, just typing this out has brought me back to graph theory, so I think I need to develop my use cases more thoroughly before I can see what really conflicts, because right now it's just full of potential for conflict. Thanks for engaging!

aron · March 8, 2019, 6:24pm

Thanks for the additional examples, @mfoos. It helps remind us that the real world is far "messier" than any strict hierarchy we might want to create. Your disease:protein example shows how our current tag categories can fall short. I don't have a great suggestion other than using repetition, as you've outlined. "proteins:A" with "diseases:X:A" and "diseases:Y:A". The fact that "A" is the same protein would be by convention and not enforced by RStudio Connect.

The breadcrumbs (Target 1) are not great when that tag name is repeated in the hierarchy. I've filed that as something for us to improve.

I am also taking back the discussion about more flexible ways of tagging/annotating content. Hopefully we can make this better.

mfoos · March 8, 2019, 6:41pm

This has been an excellent conversation, thank you both.

cole · March 9, 2019, 4:07am

I second Aron's thanks! I very much appreciate the practical example here. The user-flow / user thought process is something I completely disregarded in my earlier post as a focus for tag hierarchy. It is very important to find a way to organize tags in a way that syncs with user expectations. My first brainstorm idea is a kind of "dynamic" tag tree that responds to selections to show only valid combinations of tags and narrow the user's focus to the types of things they would be interested in as further selections (projects, targets, trials, etc. as you mention). I am definitely no UX designer either, though This example will be very helpful. Thanks again!

system · March 30, 2019, 4:07am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.