This is where I will be occasionally posting some statistics about the site that I've found interesting or that people have asked for. Expect a lot of numbers, graphics, and occasional tables.
Today I will be posting a lot of stats about articles, ratings, words, users, and their relationship with each other. I'm using a 2014-10-05 snapshot of the site for these stats. All images are clickable for a HD version.
Some colonel general numbers
Let's start small and just get some general numbers about the site. For the sake of simplicity, skip is anything tagged as 'scp' (this includes jokes, arcs, decomms, etc.) and tale is anything tagged as 'tale' (and thus doesn't include goi-format stuff or anything else like that).
- There are 4597 pages on the site, including 2412 skips and 1493 tales.
- There are 707 users on the site who wrote at least one article, excluding deleted accounts and anonymous users.
- There are 7388 users on the site who voted on at least one article.
- There are 7445 users on the site who have done either of the above.
- There are 374164 individuals votes on the site, including 333853 upvotes and 40311 downvotes.

- The net cumulative rating of all the pages on the site is 293500. The average, mode, and standard deviations for page ratings are ~64.4, 34, and ~71, respectively.
- The net cumulative rating of all the skips on the site is 212935. The average, mode, and standard deviations for skip ratings are ~88.9, 55, and ~82, respectively.
- The net cumulative rating of all the tales on the site is 64432. The average, mode, and standard deviations for tale ratings are ~43.2, 31, and ~42.3, respectively.
- There are 38 pages on the site that do not have a rating. Half of them are decomms, and the rest are various system pages.

- The total wordcount for all the pages on the site is 5627524, with an average of ~1224.
- The total wordcount for all the skips is 2431068, with an average of ~1008.
- The total wordcount for all the tales is 2458257, with an average of ~1647.
- There are 3022 images on the site, including 2049 images in skips and 213 in tales.
- The total number of comments on all the pages is 142648, with an average of ~31.
- The total number of revisions on all the pages is 86621, with an average of ~19.
Plots, plots, and also some plots
Alright, now we're getting to the interesting stuff.
Let's see how page ratings differ for young and old articles:
Tales are consistently rated lower than skips, with a few exceptions. You can see that ratings are dropping down at the end; there's a couple of possible explanations for this, but none of them align completely with the rest of the data, so I'm not sure what's going on here. Let's wait a few months and see what happens.
Now, similar plot for word counts:
This one, not surprisingly, stays mostly constant throughout. Skips are consistently shorter than tales, which is, again, not surprising. See that huge spike in the tale word count in the middle of 2009? You can see a similar one in the rating graph, and will see it in some of the graphs below. It comes from the fact that a total of 2 (two) tales were posted (and survived to this day) in the June of 2009, on of which happened to be Log Of Anomalous Items. So yeah, tale stats for that months are pretty screwed up.
Total number of articles posted in each month:
This one is interesting.
See that completely dead zone in the autumn of 2009? That's Mass Edit. Yeah. Not only a lot of articles where deleted during that time, but the creation of new articles was temporarily blocked, leading to the Hole Of Nothing you see in the plot.
Also note that the highest all-time value of created articles is at the left edge of the plot. During the migration to wikidot, over a year of articles was transfered to the site simultaneously, creating the huge articles-per-month number that hasn't been surpassed since.
Another interesting observation: the number of skips is on slow decline since 2012, while the overall number of page is pretty good. This makes sense to me give our greater focus on tales, goi pages, etc. , and really, I'm ok with this.
The giant spike in the middle of 2012 is Containment Breach and the 087 game, but what is the spike before it, in autumn of 2011? I have no idea. Whatever it is, looks like it was the turning point for the tales on the site. Not only the number of tales had increased tenfold, but even after the spike the number of tales remained high from then on, with the lowest points been higher than the highest points to the left of the spike.
Here's the number of comments and number of revisions based on the article's age. See the Anomalous Items spike again? Nothing else much interesting here.
Number of images added to the site in each month:
See that huge blue spike at the right edge? Say hi to SunnyParallax, everyone.
Average vote count for articles of varying ages:
Looks familiar? This is practically a copy of the average rating plot. It sorta makes sense: people like high-quality articles, they upvote it, tell their friends about them, the friends go to read it too and upvote it as well. Meanwhile, no one gives a frell about some +20 article. Personally, I'd prefer this plot to look more like a straight flat line, but that's not realistically possible, so oh well.
So far, we've had plots that showed how various stats change from old to new articles. Let's do something different now. Let's see how those stats correlate with each other.
Let's start with relationship between word count and rating:
For skips under 500 words, the rule seems to be "shorter is better". But once you hit 500, adding more words is likely to increase the rating. Also the highest rated skips are those below 100 words, because Scantron. The only mainlist skip with less than 100 words is 1159 at +69.
For tales, the wordcount doesn't seem to matter until 4000 words mark, at which point it can go either way, but all the highest rated are between 6000 and 8000 words.
Next, average rating based on number of revisions:
For this one, I cut off all the pages with more than 180 revisions. There were like 10 of them, and they were mucking up the whole plot, making the left part all squished together and unintelligible. And the left part is where all the good stuff is.
Up to 50 revisions, the more revisions your skip has, the higher its rating will be. This is honestly really surprising to me. Personally, I had always tried to keep the number of revisions in my skip to a minimum. Not to the point of not fixing any issues, but I had always tried to polish my draft to a point of perfection before posting it, and then fix as many issues in one revision as I could. I'm most proud of 1511 in that way, since it only has 3 revisions, and one of them is adding tags, and the other two weren't strictly necessary. So, yeah, this is unexpected, but interesting.
For tales, this doesn't really hold. After only 20 revisions, adding any more is more likely to make things worse.
Now, this is an interesting one. How would adding a picture to your skip affect its rating? Let's see:
Basically, adding one, two, or three pictures is not likely to improve things from not having any images at all. Then, adding any more dramatically improves the rating, with 7 images being the sweet spot.
Tales are not really affected by the number of images in them.
Now, something a bit different. So far we've looked at how different aspects of an article affect its rating. Now, let's see how the article's rating affects the number of comments on that article:
Pretty much what you'd expect here. The part that I find interesting is that tales receive less comments than skips with the same rating, especially in the < +100 range. This is sad, since that's the tales that need the most feedback.
Let's do something different again. So far, we've been working with three groups (pages, skips, and tales), and for each group we had its own colored line on each plot. Now, we will have a different kind of groups: each group will represent some subset of the userbase, based on the number of articles a user has created on the site. Specifically, the groups will look like this:
- Group 1 (black): Users who created 0 articles. 6736 users in this group.
- Group 2 (blue): Users who created 1 article. 332 users in this group.
- Group 3 (orange): Users who created 2 to 3 articles. 153 users in this group.
- Group 4 (green): Users who created between 4 and 8 articles. 111 users in this group.
- Group 5 (magenta): Users who created between 9 and 20 articles. 56 users in this group.
- Group 6 (red): Users who created more than 20 articles. 58 users in this group.
Now let's see all the fun plots!
Average rating, depending on the age of the article, and a zoomed in version of the same:
Predictably, the non-contributing members contribute the most (hah) to the overall rating, because there are so many of them. However, this plot isn't very useful, since it doesn't let us compare directly the voting patters of different groups. So let's look at this instead:
Now, this one, this one is interesting. It shows the ratio of the net rating for each group in each month to the total number of votes by this group. This value is independent of the group size, so we can compare different groups more easily. Unfortunately, it also negates any effect from neutral votes :(.
Anyway, what we can see from it is that the more prolific authors are also grumpier and more likely to downvote. This is especially noticeable for the old, pre-2009 articles. After the middle of 2009, most of the lines on the plot are close together, with only the red one being noticeably lower. However, for the 2008 - early 2009 articles, there is a big divide between the non-contributing members and everyone else, with non-contributors giving consistently higher scores. This is actually something that I have long suspected, but now I have data to back it up. Also keep in mind that non-contributors amount to something like 90% of the total rating of each article, and you can see why some people would think that old articles are overrated.
Another interesting thing to note is the two big drops in the red line in 2014. Notably, these drops are not present to such degree in the other lines. While the difference between the red scores and any of the other ones is around 0.2 for most articles, for the articles created in 2014 it jumps to something like 0.4, which is huge. I personally don't know what is causing this, but I'd like to hear people's ideas.
Here's another neat one. How many articles have each group created in each month:
As you can see, the people in the 21+ group have been consistently busy since the middle of 2011. While there are people who wrote a lot of stuff long ago and haven't been writing since, they are not the rule.
Also the number of one-hit-wonders is pretty consistent through the years.
Do people writing more stuff more like to create highly-rated articles? Let's find out:
What we're looking for is the point where each curve peaks. If a group is more likely to create a higher rated article than another group, then its curve should peak further to the right. As we can see, those who wrote 21+ articles are about as likely to create another hit as those who wrote one. People who wrote between 8 and 20 articles do even worse.
Similar plot, but for word counts instead of ratings (full and zoomed in variants):
While for articles below 5000 words there doesn't appear to be a significant correlation between the length of the article and the number of article the author has written, the entirety of the right tail is dominated by the authors who wrote > 20 articles.
Counting words
Here's the rules for this section:
- Any symbols except for "'", "█", "_", and "-" are treated as word separators and are not included in the words.
- The "'" characters are stripped from the beginning and end of the words afterwards.
- All words are converted to lowercase.
- Cyrillic, Chinese, umlauted, and other non-latin characters are treated as a part of their corresponding words.
- [DATA EXPUNGED] and [DATA REDACTED] are converted to DATA_EXPUNGED and DATA_REDACTED respectively, so that they'd be counted as a single word.
- All Site designations are converted to the apostrophized version, e.g. "Site 123" is converted to "Site-123".
With that, let's begin.
There are a total of 5627524 words on the site, comprising 117304 distinct words.
The ten most-often used words on the site are:
- the: 289510 occurrences.
- to: 161202 occurrences.
- of: 158997 occurrences.
- and: 130104 occurrences.
- a: 123418 occurrences.
- in: 82200 occurrences.
- it: 57563 occurrences.
- is: 55426 occurrences.
- i: 52181 occurrences.
- that: 49590 occurrences.
What, you expected anything different? :D
The ten most-often used nouns on the site are:
- one: 16285 occurrences.
- dr: 13739 occurrences.
- time: 11306 occurrences.
- subject: 10623 occurrences.
- containment: 10040 occurrences.
- foundation: 9393 occurrences.
- personnel: 7758 occurrences.
- two: 6625 occurrences.
- man: 5977 occurrences.
- object: 5703 occurrences.
Other interesting often-used words:
- -: 10744 occurrences.
- ██: 6468 occurrences.
- 1: 5602 occurrences.
- room: 5111 occurrences.
- agent: 4807 occurrences.
- item: 4193 occurrences.
- subjects: 4029 occurrences.
- human: 4026 occurrences.
- scp: 3973 occurrences.
- test: 3959 occurrences.
- class: 3920 occurrences.
- good: 3900 occurrences.
- area: 3774 occurrences.
- testing: 3752 occurrences.
- procedures: 3747 occurrences.
- description: 3696 occurrences.
- eyes: 3643 occurrences.
- site: 3443 occurrences.
- log: 3429 occurrences.
- special: 3420 occurrences.
- addendum: 3320 occurrences.
- world: 3276 occurrences.
- however: 3189 occurrences.
- security: 3182 occurrences.
- ████: 3137 occurrences.
- anomalous: 3093 occurrences.
There are a total of 53808 words on the wiki that are used exactly once. For those keeping count, that's 45.9% of all the unique words used on the site. Wow. Here's a random 10 of them:
- pratfalls
- therapsids
- 001-n
- 1416-2-5
- swe-
- fhwa
- tipped-over
- 447m
- soil-brown
- 1647-1
I wanted to include a list of words that occur at least once in the highest number of articles, as opposed to the total number of occurrences. However, after looking at it, it's pretty much the same list, so I won't be including it here.
Let's define the "rating" of a word in the following way: for each article the word appears in, its rating from this article is the rating of article divided by the number of words in an article, and multiplied by the number of the occurrences of the word in the article; the total rating of the word is then the average of its ratings from all articles it appears in. Given this metric, top-20 highest-rated words are:
- scp-1171-1, with the rating of 18.55.
- ___, with the rating of 12.76.
- procrastinate, with the rating of 12.57.
- t1lead, with the rating of 11.61.
- scp-1322-a, with the rating of 9.50.
- scp-884-4, with the rating of 8.69.
- scp-1522-1, with the rating of 8.62.
- scp-1193-01, with the rating of 8.48.
- scp-1522-2, with the rating of 7.94.
- scp-1111-2, with the rating of 7.15.
- scp-50-ae-1, with the rating of 6.42.
- scp-1541-1, with the rating of 6.42.
- thing-i, with the rating of 6.33.
- scp-946-1, with the rating of 5.93.
- scp-1111-1, with the rating of 5.88.
- scp-1535-1, with the rating of 5.68.
- uploading, with the rating of 5.40.
- d-9884, with the rating of 5.23.
- kato, with the rating of 5.16.
- scp-2053-1, with the rating of 5.05.
Ha, this is hilarious if you ask me. But let's try this again, this time excluding words that are clearly tied to specific articles:
- ___, with the rating of 12.76.
- procrastinate, with the rating of 12.57.
- t1lead, with the rating of 11.61.
- thing-i, with the rating of 6.33.
- uploading, with the rating of 5.40.
- kato, with the rating of 5.16.
- islet, with the rating of 4.36.
- 48l, with the rating of 4.22.
- ████████████████████████████████████████████████████████████████████, with the rating of 4.17.
- type-s, with the rating of 3.86.
- becaus, with the rating of 3.45.
- boone, with the rating of 3.23.
- northrop, with the rating of 3.17.
- 51l, with the rating of 3.16.
- mulhausen, with the rating of 3.16.
- amiiiigoooos, with the rating of 3.16.
- frends, with the rating of 3.15.
- the, with the rating of 3.11.
- westington, with the rating of 2.92.
- loyd, with the rating of 2.83.
- 3rdsister, with the rating of 2.83.
- d-prey, with the rating of 2.81.
- beauremont, with the rating of 2.65.
- o██, with the rating of 2.63.
- cactiiii, with the rating of 2.63.
Let's look at the words containing the "█" character. There are a total of 1454 such words on the wiki. Here's the top 24:
- ██: 6468 occurrences.
- ████: 3137 occurrences.
- ██████: 2862 occurrences.
- █████: 2684 occurrences.
- ███████: 2033 occurrences.
- ███: 1499 occurrences.
- ████████: 1431 occurrences.
- █: 1260 occurrences.
- 20██: 1025 occurrences.
- █████████: 971 occurrences.
- ██████████: 680 occurrences.
- 19██: 556 occurrences.
- scp-███: 545 occurrences.
- site-██: 439 occurrences.
- o5-█: 331 occurrences.
- scp-████: 301 occurrences.
- ████████████: 261 occurrences.
- ███████████: 229 occurrences.
- ██-██-████: 221 occurrences.
- ██████████████: 120 occurrences.
- o5-██: 119 occurrences.
- 200█: 119 occurrences.
- 199█: 102 occurrences.
- ██████'s: 102 occurrences.
If we add up all the █ characters from all the words, we'll end up with 179874 characters. Assuming that the width of a single █ is 3mm, if we'll write all the █s on the wiki one after another, we'll end up with a line 5.4 kilometers long. That means it will take a person around an hour on average to walk over all the redactions on the site.
The other common redaction tools look like this:
- redacted: 2467 occurrences.
- data_expunged: 2264 occurrences.
- data_redacted: 118 occurrences.
There are 358 words on the wiki that start with 'site-'. The top 20:
- site-19: 602 occurrences.
- site-██: 439 occurrences.
- site-17: 312 occurrences.
- site-23: 117 occurrences.
- site-38: 102 occurrences.
- site-77: 66 occurrences.
- site-87: 50 occurrences.
- site-19's: 49 occurrences.
- site-73: 48 occurrences.
- site-76: 41 occurrences.
- site-93: 41 occurrences.
- site-77's: 41 occurrences.
- site-11: 36 occurrences.
- site-37: 36 occurrences.
- site-15: 35 occurrences.
- site-18: 33 occurrences.
- site-33: 31 occurrences.
- site-59: 31 occurrences.
- site-███: 30 occurrences.
- site-28: 27 occurrences.
There are 8975 words on the wiki that start with 'scp'. The top 20:
- scp: 3973 occurrences.
- scps: 1244 occurrences.
- scp-███: 545 occurrences.
- scp-682: 490 occurrences.
- scp-████: 301 occurrences.
- scp-261: 226 occurrences.
- scp-001: 197 occurrences.
- scp-173: 153 occurrences.
- scp-076-2: 139 occurrences.
- scp-2998: 136 occurrences.
- scp-083: 120 occurrences.
- scp's: 119 occurrences.
- scp-239: 114 occurrences.
- scp-914: 112 occurrences.
- scp-1247: 107 occurrences.
- scp-610: 106 occurrences.
- scp-241: 106 occurrences.
- scp-1893: 105 occurrences.
- scp-882: 104 occurrences.
- scp-093: 103 occurrences.
There are 265 occurrences of 'skip' on the site. There are 34 occurrences of 'scip' on the site.

The word length for the words on the site ranges between 1 and 1438 characters. There are a total of 326 words on the site longer than 100 characters. The average, mode, and standard deviation for word length are 6.17, 4, and 3.76, respectively.
Can't forget about the most controversial word choice on the site:
- amnestics: 492 occurrences.
- amnesiacs: 338 occurrences.
- amnestic: 198 occurrences.
- amnesiac: 167 occurrences.
Yay for team scientific accuracy.
Revisions, Edits, History things
As was said in the beginning of this post, there are 86621 revisions on the site, with an average of ~19 revisions per page. Let's get into more details.
- 39695, or 45.8% of all revisions are done by the author of the page.
- On average, each user makes 13.9 revisions on pages they didn't create.
- Each author, on average, makes 54.8 edits to the pages they created.
- The average difference between the time a revision is made and the creation of the article is 346 days.
- However, 36115, or 41.7% of all revisions are made within a week of the article's creation.
- The average length of the revision comment is ~19.3 characters.
- There are 39529 (45.6%) revisions on the wiki that do not have a comment.
Here's the plot showing the number of revisions made in each month:
Unlike a similar earlier plot, this is based on when each revision was made, not on the age of the article. The overall shape is very similar to other plots reflecting overall site activity, such as the one showing the number of created pages.
Number of revisions, grouped by day of the week and by hour of the day (in UTC):
The site's userbase is heavily biased towards particular timezones. The drop on Saturdays is merely 20% of the overall number, while the drop around 9-11 AM is a whooping 300%.
The average rating of the article, grouped by day and by hour:
Wow, this is interesting. The day of the week plot is an almost straight line until the Saturday, and then it drops massively. Meanwhile, the second plot peaks right after the time when the lowest number of revisions is made. There seems to be a strong inverse correlation between a large number of edits being made on the wiki and high-rated articles being created.
Big Tables Full of User Stats
Here's the link with the google spreadsheet: http://goo.gl/bntclU
You can changes sheets by clicking on the tabs at the bottom. You can sort various columns by right-clicking on the column's header and then clicking "Sort sheet A - Z".
Let me now explain what you're looking at. Each sheet in the above link contains various statistics for user/authors on the site. I'll go through each one and explain what it means.
author stats: general
This one lists the number of pages created by each author, as well as their cumulative and average ratings, word counts, and the number of the images appearing in the author's works.
user stats: general
This one is mostly about voting. The 'revisions' column lists the number of revisions each user has made on the site. The remaining columns show the number of upvotes and downvotes the user made, and various numbers deriving from them.
user stats: tags
This one shows which tags various users prefer most. The text in each cell is the name of the tag, and the net vote rating for this user on articles tagged with the tag. The columns should be obvious.
author stats: tags
Unlike the previous sheet, this one not about how people vote on various tags, but about how much they use those tags in their writing. The numbers in the parentheses are how many pages the user has created with that tag, and what percentage of the total volume of their work it covers.
user stats: yearly votes
Net and Total ratings for each user, separated by the user the article was created. This should show which users prefer old articles to new ones, which users prefer new articles to old ones, which users only ever read new stuff, etc.
author stats: words
This one is an interesting one. Straight up listing the words each user uses most would be boring, since that would pretty much mirror the site-wide most used words, and will consist mostly of articles, particles, and the like.
So instead what we'll see is the words that are used disproportionally more by certain authors compared to their overall frequency on the site. The number in the parentheses then is the difference between the two.
For example, if there are a total of 100,000 words on the site, the word "yup" occurs 1000 times, and someone creates their first article with the entire text of it being "yup yup yup yup nope", then the delta value for the word "yup" for this author would be:
(4 / 5) * 100% - (1000 / 100000) * 100% = 80% - 1% = 79.00%
And this concludes this post of numbers and colourful pretty lines. I will probably add some more stuff later on, and updates the numbers in the spreadsheets every once in a while. Meanwhile, everyone are welcome to discuss their thoughts on various numbers, ask for clarifications, or request additional stats/plots/sheets.