Building a heat map for docs

Setup

Heat maps are a very visual way of representing how activity gets distributed over a large space. It help narrow down the points of interest and make better assumptions about what is useful, attractive, or curious, and what is irrelevant, useless, or boring over time.

Heat maps are in constant movement; they change as people are drawn to different points of interest, reveal the intensity of activity at a certain point, and slowly make it decay over time to show when people are not returning to that same point.

Example heat map from Strava

Applying this concept to Slite allows us to surface better where people go in the knowledge base, what they care about and what docs are relevant at a specific point in time. Information is often time sensitive. It has relevancy/accuracy that decays over time as people, projects and companies evolve, and Slite should reflect this. The interest of a Doc is not absolute and decays over time, we should know at a T time (πŸ΅πŸ‡¬πŸ‡§) which Docs are relevant or not.

Also, using this algorithm combined with our Knowledge Management panel allows for maintaining the knowledge base more efficiently by proposing doc removals, updates and more of Docs that have low interest.

Implementation

How to measure interest over time?

The interest of a Doc can be measured inversely to its "heat". Heat is generated by activity on the Docs and decays over time. So first we want to find a good decay function that will allow us to calculate how heat evolves over time.

The formula needs to meet to following criteria:

  • Intense punctual activity should decay fast - If you publish a company update, the doc will get read by the entire company on the day it's shared generating a lot of heat but it doesn't really mean that people will want to read this doc again in the next 3 months or more.
  • Sporadic but recurring activity should keep docs alive - We don't want projects that are spanning over long period of time but still running to be considered irrelevant.

So it should decay fast for large heat values and slow for low ones. Exponential functions are good candidates for this kind of purposes, as for example it is used to define radioactive decay. So we came up with following formula:

It allows us, giving any previous heat value, calculate its decayed value after a certain period of time. If we run this formula over an example Doc that had activity on it in the past but is now stale, it would look like this:

The doc is considered inactive 90 days after creation date as activity stopped

‍

There's one last issue though, docs with very very high activity count will have insanely high value of heat compared to the rest of the knowledge base, sometimes taking them way too long to decay without activity. So we want to cap the heat at a certain value to ensure a consistent number of days after last activity to consider a doc inactive:

With the cap, the Doc becomes inactive in about 70 days instead of the previous 90

‍

DecayRate is an arbitrary value and the MaxHeat is calculated by reversing the decay function so that given the DecayRate, a Doc goes stale in about 70 days.

Storing the heat

Next up is to find a data model that allows us to query the current heat of Docs without having to run updates to the entire database every day.

To achieve this, we will using the following columns in our Docs table:

ALTER TABLE "docs"ADD COLUMN "heat" INTEGER SET DEFAULT :max_heatADD COLUMN "lastHeatTickAt" DATETIME SET DEFAULT NOW()

The first one will store the last computed heat of a document and has a default value equal to the MaxHeat so that by default a doc will take 70 days without any activity to become stale.

The second column lets us to know the last time the heat of the Doc was computed. This computation happens when activity is generated on a Doc:

  • First we apply the decay using the decay function, counting the days from last heat update
  • Then add the activity corresponding heat to get the final value

This allows us to also query the current heat of a doc quite easily without having to constantly update it as to query we just have to apply the decay function.

Generating heat

Now that we have a way to store the heat, we need a way to generate it. This can be achieved by tracking certain activities:

  • Access to a doc
  • Updates to its content
  • Adding children
  • Commenting
  • Adding references to this doc in other docs
  • Etc

Each activity type has its own weight so that a Doc decay gets reset in certain situations and not in others. Editing for example puts it back at max heat whereas commenting only adds back half of the MaxHeat (which corresponds to 35 days worth of liveliness).

Heat also needs to flow to parent/neighboring docs as topics tend to get grouped in a hierarchy. So generating heat on a doc, would also spread to parents, linked docs and parents of linked docs. Linked docs only get a % of the heat as we want to keep references to active Docs alive but not necessarily as hard as the active Doc itself.

The reason we make 100% of the heat flow from children to parents is to optimise for our hierarchical view that is the Docs sidebar. We never want parents to go stale before their children as it would lead users to think a Doc/Channel is stale where in fact it contains activity.

Querying heat

To fetch the heat of a note, we just have to look at the lastHeatTickAt and apply a rate to the time difference. This will give us how much heat was lost since last update. Then we subtract this to the current heat and we have the final heat score of a doc.

Backfilling the heat system

To make sure that we can run experimentations, learn things quickly and re-compute the heat system using new weights, new DecayRate / MaxHeat values, we want all activity to be tracked first. Then, to prevent having to recompute how each activity affects the tree of Docs each time we recompute the entire heat map, we will use an intermediate table that will, for a given activity, contain all the heat update events to the hierarchy.

Example:

  • A comment activity on a doc with two parents docs will generate 3 heat events, one for the doc itself and one for each parent

This gives us two steps to compute the whole heat map:

  • Extract all heat events from all activities
  • Apply heat events in chronological order for each note

Using the feature live

You can see the results of using the heat map in Slite by turning on the toggle "Highlight inactive docs" in the sidebar.

You will then be able to see all the inactive Docs and Channels greyed out, hinting you to potentially archive things.

Coming soon, you will also be able to list Inactive docs using the heat map inside the Knowledge Management panel. Allowing you to cleanup your knowledge base in bulk efficiently πŸ’ͺ

You
Bring your team on the same page.
Discover Slite