Optimize our Categorization of Praise

divine_comedian · October 21, 2022, 4:17pm

Hey everyone!

I wanted to point out this awesome component of our praise analysis that @liviade proposed and @zhiwei made into a reality - The categorization of praise!

In case you’re unfamiliar the praise categorization looks something like this:

You can find the latest cross-period analysis here. (Check near the bottom for praise categorization)

https://rawcdn.githack.com/CommonsBuild/tec-rewards/79c9740ce0c5e71fc2103e51beeea317e9e1b9a8/distribution_rounds/round-19/distribution_results/reports/round-19--cross-period-analysis-report.html

While this was implemented a while ago I don’t think we ever took the time to check out how this works and how we can optimize it to give better results!

How categorization works

We can define certain categories that represent a “grouping” of certain words. We can also define the words considered a part of each category. The cross-period analysis then finds the specified words in each instance of praise dished and puts it under it’s associated category.

From this process we can identify how often we dish praise under a certain category. We can also identify the average scores quantified for each category.

We can also see the 3 highest-scored praises across the specified period for each category. In the case above we’re looking at the previous 52 weeks.

Current Categories and Keywords

We currently have 9 categories, here they are and their associated keywords:

attendance

join
attend
show up
participate

discussion

question
ask
discuss
discussion
conversation

work

help
work
design
make
write
hack
edit

lead

host
lead
initiate
form
organize
steward

share

share
spread

twitter

twitter
tweet

hack

hack
test

general

support
awesome

IRL

trip
conference

The purpose of this forum post is to review and refine our keywords and categories. Happy to receive any proposals and suggestions!

We can also make progressive iterations by updating the keywords and running the analysis and reviewing the results.

The optimization of our categories and keywords will lead to eventually a quantification guide that will help quantifiers use their best judgement in the process by providing accurate historical scoring data.

Maxwe11 · October 21, 2022, 8:47pm

I see the keyword hack in two categories (work and hack). I think most will understand, but it might be good to explicitly call out that it’s just a tip/guide to be helpful and might not apply and/or shouldn’t overrule your judgment as a quantifier.

For IRL, I’ve also seen praise around taking mental breaks, rejuvenate, etc. Anything come to mind other than ‘break’ which could be ambiguous?

divine_comedian · October 24, 2022, 2:00pm

I think another category for “self-care” would be very useful to capture. We could look for keywords such as “rejuvenate”, “mental”, “vacation”, “time off” (have to check if we can add phrases to the system)
“break” I would avoid because apps or websites often break which could make things confusing

maybe it’s worth doing some of our research in the praise channel and see some historic wording

I also think removing the hack category would be useful since hacking has become often synonymous with working

divine_comedian · November 16, 2022, 3:53pm

@Maxwe11 and I hacked through a list of updated keywords and I just ran it into the latest cross-period analysis - check out the results!

https://rawcdn.githack.com/CommonsBuild/tec-rewards/c501ba7d9bce43c738b51c0d7f65d6d0913d793b/distribution_rounds/round-21/distribution_results/reports/round-21--cross-period-analysis-report.html

Maxwe11 · November 16, 2022, 6:19pm

Thanks for posting! That distribution looks reasonable, and while there are outliers they seem to be relatively few. We talked about reviewing the uncategorized praise to see if there were any trends we could identify, do we have that data? I saw a bulk number, which was higher than I expected at roughly 1/3 of all instances uncategorized.

rex · November 22, 2022, 8:17pm

It might be easier to review the uncategorised praise with the frequency analysis @enti set up earlier last month.

e.g. Feedback

Give it a spin!

https://eenti.github.io/TEC-Discord-Analysis/

divine_comedian · November 23, 2022, 10:29pm

Are you able to use this tool to see the full message the word was contained in?

It’s not exactly useful to see how often the word appears but rather if it’s being used in a context that reliably enough fits under a certain category we define.