Version 0.5: filters, improved overlaps, bug fixes!

Hey everyone!


We've got a bunch of fixes and improvements in since the last blog post.

- FILTERS

- Toggle fields on and off

- Better overlap stats

- Better "reviews since launch" stat

- Various bugfixes


Filters & Fields

Go to any list view page on the site, like this one to see some new "power tools":

https://www.gamedatacrunch.com/steam/list/all/reviews_total/ 


The "power tools" are only available in the desktop view for now, but they let you choose which data fields you want to see as well as what filters you want to apply to the list.

Fields:
Click on any of the fields under "show fields" to toggle that information on or off. There's not enough room to fit everything in at once, and not every research question calls for the same data, so this should let you quickly and easily see exactly what you care about.

Filters:
Below filters you'll find a column of filters you can use INCLUSIVELY and EXCLUSIVELY. Click on a filter's name and it will show you ONLY games that have that property.  Click on the 🚫 button next to it and you'll exclude all the games with that property from the list. You can have as many filters as you like active at a time.

Filters are sorted by the number of titles in the list that have the given property.

You'll notice that the filters have one of three different icons:

🏷 User Tag

🗃 Store Category

Mature Content Descriptor

You'll further notice that sometimes the same concept will be encoded under more than one of these -- like "Singleplayer" as a tag as well as a Store Category, or "Violence" as both a tag and a Mature Content descriptor. This is an artifact of how data is natively encoded on Steam. I have plans for cleaning this up in the future, but in the interest of development speed I'm just blatting it all out as-is for now. The rule of thumb is that if information is duplicated as both a user tag and as either a store category or a mature content descriptor, the user tag tends to identify less games, and is usually less accurate in the games it does identify.

For information encoded only as user tags, you might notice that the tags on my database don't line up 100% with Steam's. That's because our tag profiles all have conservative Query Expansion applied by default -- meaning a game with "2D Platformer" will always have both "2D" and "Platformer" added to its tag list on GameDataCrunch, even if they aren't there on Steam's.

Better Overlap Stats

People had pointed out that our overlap stats for similar games weren't particularly useful before, and I agreed. The problem was that when you say two games "overlap" with each other, you can define that various ways.

If you just count the absolute number of overlapping reviews, you basically just get a list of all the most popular games on Steam that everybody owns -- Counterstrike, DOTA2, The Portal 1&2, The Witcher 3, etc.

What I've settled on for now is to find the game that has the largest proportion of its own reviewers in common with the Target Game. That returns lists like this.

And that's pretty decent, even if it's biased a bit towards smaller games. To clarify further I put a tooltip on the reviewer overlap field that shows you how the overlap works both ways. For instance, the "Quake Mission Pack #1" has 103/331 reviewers in common with Doom, which is a whopping 31% of its playerbase. However, calculated the other way, that's 103/130536 = ~0%. So I think there's still some room for improvement here.

Better "reviews since launch" stat

A bunch of people pointed out that the "reviews (1st week)" stat was confusing because it was often higher than the "total reviews" stat. Shouldn't it always be lower, or at most equal, to that stat? The cause of this discrepancy was sourcing these stats from two different API's, one that filters out non-verified-purchaser reviews and review bomb periods, and one that didn't. Now I'm getting them both from the same source. I've also started tracking the number of reviews in the first 30 days and 90 days since launch, but I haven't exposed it yet.

I should note that "since launch" is a bit ambiguously defined here. The Steam New Releases chart list happily combines both "Early Access Exits" and "First Listings on Steam"
and labels them both "New Releases".  I am tracking Early Access status changes on a day to day basis, but I have no good historical information for games that existed before GameDataCrunch was launched. When I finish piecing together an authoritative list of Early Access transitions, I'll be able to treat "Early Access Exits" separately from "First Listings on Steam" but for now they're just ambiguously mixed together and you'll have to be aware of that. Fortunately I'm pretty good at detecting whether any game has ever been in Early Access even after its exit -- all I have to do is detect a single review written during the Early Access period.

We fixed a bunch of little bugs here and there too, and doubtlessly introduced new ones. Let us know what you find and what you want to see next.


Thanks as always for your support! Your donations make this project possible.


By becoming a patron, you'll instantly unlock access to 1 exclusive post
1
Poll
By becoming a patron, you'll instantly unlock access to 1 exclusive post
1
Poll