Algorithm
There's a lot of content there, so I can't read through every one. Thus, I need an automated way to get it. Luckily, there's a python package called PRAW that makes it easier to mine data on reddit. For the mining then, I used the following algorithm:
- Get all posts for a day
- Filter those posts by [OC] posts as those are most likely to have specified the tools used
- Filter the comments of those posts by comments made by the post author since those are the ones that might specify the language
- Search those comments for the list of languages that seem most likely and track which ones are found
- Repeat for the next day in the list
I also did a few other things (track score of submission, track date of submission, etc.) for additional analyses. This algorithm isn't perfect in that it can miss situations where languages were specified and don't trip my filters (e.g., a typo would kill this) and many posts don't specify languages in the first place. It should work as a pretty good estimator though.
Most used language
First, we need to know how many posts identified languages used. I ran this against the last two years of data, and found that there were 7845 posts flagged [OC], and the algorithm above identified the language(s) used in 4189 of them.
Next, we can do a simple comparison of languages usages vs posts that specified languages, and that's where you get the post at the beginning (% = 100*(posts specifying this language/posts specifying any language)):
Note that the numbers above add up to >100% because some posts specified multiple languages (511 of the 4189 posts with identifiable languages).
And that's the original goal. It's clear that Excel wins by a landslide. I guess it makes sense because almost everyone can use Excel and it's really quick to get plots out. Python dominating MATLAB surprised me at first but makes sense in retrospect since MATLAB is not free and has fewer users (it's just really great for working with data).
That looks odd. We can't assume post scores have a normal distribution though, so another test is using medians:
That's a huge disparity between median and average. How weird is the distribution? A histogram with logarithmic bins yields:
That is much clearer to me. One interesting thing is that it spikes up in the 3 to 10 thousand score range, so I'm guessing that's when a post makes it to the front page maybe? An idea then is to look at the score distributions by language:
It's pretty clear from this that excel is more bottom heavy than some of the others. A huge number of posts with a score of 0 used it, and it has very few posts with extremely high scores, especially considering that it is the most popular language/tool for this. It looks like MATLAB and Adobe tools have the highest percentage of high-scoring posts, but they have so few samples it's hard to know. Among the popular languages/tools, Python and R appear to do best.
A final way to answer what languages/tools are most likely to yield a high score is to see what percentage of posts using the language/tool yield a score above 100:
Next, we can do a simple comparison of languages usages vs posts that specified languages, and that's where you get the post at the beginning (% = 100*(posts specifying this language/posts specifying any language)):
Note that the numbers above add up to >100% because some posts specified multiple languages (511 of the 4189 posts with identifiable languages).
And that's the original goal. It's clear that Excel wins by a landslide. I guess it makes sense because almost everyone can use Excel and it's really quick to get plots out. Python dominating MATLAB surprised me at first but makes sense in retrospect since MATLAB is not free and has fewer users (it's just really great for working with data).
Most valuable language
To make it interesting, I wanted to see if any languages predicted more success on reddit. I tried doing that a few different ways. A simple one is to get the average score per post per language:That looks odd. We can't assume post scores have a normal distribution though, so another test is using medians:
That is much clearer to me. One interesting thing is that it spikes up in the 3 to 10 thousand score range, so I'm guessing that's when a post makes it to the front page maybe? An idea then is to look at the score distributions by language:
It's pretty clear from this that excel is more bottom heavy than some of the others. A huge number of posts with a score of 0 used it, and it has very few posts with extremely high scores, especially considering that it is the most popular language/tool for this. It looks like MATLAB and Adobe tools have the highest percentage of high-scoring posts, but they have so few samples it's hard to know. Among the popular languages/tools, Python and R appear to do best.
A final way to answer what languages/tools are most likely to yield a high score is to see what percentage of posts using the language/tool yield a score above 100:
This just reinforces the takeaways from the histograms (it's basically the same information in a different form) and I'll stop there...
Probable biases in this data
I would guess that the following occurred to some degree:
- some languages are probably more prone to typos...e.g., maybe a lot of people typed 'Tablaeu instead of 'Tableau'...if that's the case, those languages would be undercounted by my crude algorithm
- a lot of OC posts don't specify the language(s) used and there might be a bias there...I wouldn't be shocked for example if a larger percentage of those actually used Excel or Tableau than something like MATLAB
- a lot of people probably specify something like 'plotly' as the tool used that would make the actual language used ambiguous even though it definitely wasn't excel in that case
- I personally submit a lot of posts using MATLAB. I think roughly 10% of the MATLAB posts are mine, and I usually submit low-quality posts that get very few upvotes (I don't think I've ever broken a score of 100 on this subreddit). Thus, I have personally hurt MATLAB's performance.
I'll think about more robust ways to catch all of these and might do this again at some point in the future. As a note, I did the data gathering and plotting in Python, but Excel doesn't have as many top posts so I redid all of the plots using Excel in hopes of breaking the trend.
My code for scanning the posts can be found here: https://github.com/rhamner/dataisbeautiful_languageFrequency
My code for scanning the posts can be found here: https://github.com/rhamner/dataisbeautiful_languageFrequency
The primary period of a Data researcher's activity is understanding the issue, gathering pertinent information, getting ready and deciphering the gathered information, model arranging and examination, representation of the demonstrated information, and at last, conveying it in the required condition.ExcelR Data Science Courses
ReplyDeleteGreat Article Artificial Intelligence Projects
DeleteProject Center in Chennai
JavaScript Training in Chennai
JavaScript Training in Chennai
Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
ReplyDeleteCorrelation vs Covariance
Simple linear regression
Awesome and interesting article. Great things you've always shared with us. Thanks. Just continue composing this kind of post.
ReplyDeleteCiencia de Datos México
Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
ReplyDeleteCorrelation vs Covariance
Simple linear regression
data science interview questions
You must have a lot of pride in writing quality content. I'm impressed with the amount of solid information you have written in your article. I hope to read more.
ReplyDeleteBest Data Science training in Mumbai
Data Science training in Mumbai
Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. data science course in coimbatore
ReplyDeleteActually I read it yesterday but I had some thoughts about it and today I wanted to read it again because it is very well written. The Random Blogger
ReplyDeleteThank you for sharing such a really admire your post. Your post is great!
ReplyDeletedata science course in Hyderabad
This Was An Amazing ! I Haven't Seen This Type of Blog Ever ! Thankyou For Sharing, data sciecne course in hyderabad
ReplyDeleteVery interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.data science courses
ReplyDeleteThanks for sharing this, I actually appreciate you taking the time to share with everybody.
ReplyDeleteData Science Training In Hyderabad
Thank you very much for this great post. 화상영어
ReplyDeleteGreat information, nice to read your blog. Keep updating.
ReplyDeletekeyword stuffing seo
how to make career in artificial intelligence
angular js plugins
what is rpa technology
rpa applications
angularjs interview questions and answers
ReplyDeleteIt's really nice and meaningful. it's a really cool blog. Linking is a very useful thing.you have really helped lots of people who visit blogs and provide them useful information.
Digital Marketing Course
I see some amazingly important and kept up to length of your strength searching for in your on the site
ReplyDeletedata scientist course in hyderabad
Fantastic blog extremely good well enjoyed with the incredible informative content which surely activates the learners to gain the enough knowledge. Which in turn makes the readers to explore themselves and involve deeply in to the subject. Wish you to dispatch the similar content successively in future as well.
ReplyDeleteData Science Training in Raipur
Thanks for posting the best information and the blog is very helpful.data science interview questions and answers
ReplyDeleteInformative blog post thanks for sharing.
ReplyDeleteSEO Training In Hyderabad
SEO stands for search engine optimization. It is the process of ranking your website at the top of the search results for a particular set of keywords. SEO experts will try to rank a specific page on the top of the search results. SEO can increase your brand’s visibility, thus creating brand awareness.
Really wonderful blog completely enjoyed reading and learning to gain the vast knowledge. Eventually, this blog helps in developing certain skills which in turn helpful in implementing those skills. Thanking the blogger for delivering such a beautiful content and keep posting the contents in upcoming days.
ReplyDeletedata science institute in bangalore
I must admit that your post is really interesting. I have spent a lot of my spare time reading your content. Thank you a lot!
ReplyDeletedata scientist training and placement in hyderabad
instagram takipçi satın al - instagram takipçi satın al - tiktok takipçi satın al - instagram takipçi satın al - instagram beğeni satın al - instagram takipçi satın al - instagram takipçi satın al - instagram takipçi satın al - instagram takipçi satın al - binance güvenilir mi - binance güvenilir mi - binance güvenilir mi - binance güvenilir mi - instagram beğeni satın al - instagram beğeni satın al - polen filtresi - google haritalara yer ekleme - btcturk güvenilir mi - binance hesap açma - kuşadası kiralık villa - tiktok izlenme satın al - instagram takipçi satın al - sms onay - paribu sahibi - binance sahibi - btcturk sahibi - paribu ne zaman kuruldu - binance ne zaman kuruldu - btcturk ne zaman kuruldu - youtube izlenme satın al - torrent oyun - google haritalara yer ekleme - altyapısız internet - bedava internet - no deposit bonus forex - erkek spor ayakkabı - webturkey.net - minecraft premium hesap - karfiltre.com - tiktok jeton hilesi - tiktok beğeni satın al - microsoft word indir - misli indir
ReplyDeleteJust pure brilliance from you here. I have never expected something less than this from you and you have not disappointed me at all. I suppose you will keep the quality work going on.
ReplyDeletedata scientist training in hyderabad
Thanks for posting the best information and the blog is very important.artificial intelligence course in hyderabad
ReplyDeletetakipçi satın al
ReplyDeletetakipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
takipçi satın al
instagram takipçi satın al
instagram takipçi satın al
takipçi satın al
takipçi satın al
instagram takipçi satın al
instagram takipçi satın al
instagram takipçi satın al
instagram takipçi satın al
takipçi satın al
instagram takipçi satın al
Thanks for posting the best information and the blog is very important.data science institutes in hyderabad
ReplyDeleteYour good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this…
ReplyDeleteDevOps Training in Hyderabad
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging…
ReplyDeleteDevOps Training in Hyderabad
Thank you for taking the time and sharing this information with us. It was indeed very helpful and insightful while being straight forward and to the point…
ReplyDeleteDevOps Training in Hyderabad
takipçi satın al
ReplyDeleteinstagram takipçi satın al
https://www.takipcikenti.com
marsbahis
ReplyDeletebetboo
sultanbet
marsbahis
betboo
sultanbet
I was actually browsing the internet for certain information, accidentally came across your blog found it to be very impressive. I am elated to go with the information you have provided on this blog, eventually, it helps the readers whoever goes through this blog. Hoping you continue the spirit to inspire the readers and amaze them with your fabulous content.
ReplyDeleteData Science Course in Faridabad
I was just examining through the web looking for certain information and ran over your blog.It shows how well you understand this subject. Bookmarked this page, will return for extra. data science course in vadodara
ReplyDeleteI want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging enedevors.
ReplyDeleteaws training in hyderabad
Extremely overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. A debt of gratitude is in order for sharing. data scientist course in delhi
ReplyDeleteI blog often and I truly appreciate your content.
ReplyDelete야설
Feel free to visit my blog :
야설
This great article has truly peaked my interest.
ReplyDelete일본야동
Feel free to visit my blog : 일본야동
I’m going to bookmark your site and keep checking for new details about once per week.
ReplyDelete국산야동
Feel free to visit my blog : 국산야동
I subscribed to your Feed too.
ReplyDelete일본야동
Feel free to visit my blog : 일본야동
Ucuz, kaliteli ve organik sosyal medya hizmetleri satın almak için Ravje Medyayı tercih edebilir ve sosyal medya hesaplarını hızla büyütebilirsin. Ravje Medya ile sosyal medya hesaplarını organik ve gerçek kişiler ile geliştirebilir, kişisel ya da ticari hesapların için Ravje Medyayı tercih edebilirsin. Ravje Medya internet sitesine giriş yapmak için hemen tıkla: ravje.com
ReplyDeleteİnstagram takipçi satın almak için Ravje Medya hizmetlerini tercih edebilir, güvenilir ve gerçek takipçilere Ravje Medya ile ulaşabilirsin. İnstagram takipçi satın almak artık Ravje Medya ile oldukça güvenilir. Hemen instagram takipçi satın almak için Ravje Medyanın ilgili sayfasını ziyaret et: instagram takipçi satın al
Tiktok takipçi satın al istiyorsan tercihini Ravje Medya yap! Ravje Medya uzman kadrosu ve profesyonel ekibi ile sizlere Tiktok takipçi satın alma hizmetide sunmaktadır. Tiktok takipçi satın almak için hemen tıkla: tiktok takipçi satın al
İnstagram beğeni satın almak için Ravje medya instagram beğeni satın al sayfasına giriş yap, hızlı ve kaliteli instagram beğeni satın al: instagram beğeni satın al
Youtube izlenme satın al sayfası ile hemen youtube izlenme satın al! Ravje medya kalitesi ile hemen youtube izlenme satın almak için tıklayın: youtube izlenme satın al
Twitter takipçi satın almak istiyorsan Ravje medya twitter takipçi satın al sayfasına tıkla, Ravje medya güvencesi ile organik twitter takipçi satın al: twitter takipçi satın al
Thanks for posting the best information and the blog is very good.data science course in Lucknow
ReplyDeleteImpressive blog to be honest definitely this post will inspire many more upcoming aspirants. Eventually, this makes the participants to experience and innovate themselves through knowledge wise by visiting this kind of a blog. Once again excellent job keep inspiring with your cool stuff.
ReplyDeleteData Science Training in Bhilai
Wonderful blog found to be very impressive to come across such an awesome blog. I should really appreciate the blogger for the efforts they have put in to develop such an amazing content for all the curious readers who are very keen of being updated across every corner. Ultimately, this is an awesome experience for the readers. Anyways, thanks a lot and keep sharing the content in future too.
ReplyDeleteData Science Course in Bhilai
Extraordinary post I should state and a debt of gratitude is in order for the data. Instruction is unquestionably a clingy subject. Be that as it may, is still among the main subjects within recent memory. I value your post and anticipate more.data analytics course in gurgaon
ReplyDeleteNice article with valuable information. Thanks for sharing.
ReplyDeleteAWS Training in Chennai | AWS Training institute in Chennai
very interesting to read AWS certification Training in Chennai
ReplyDeleteThis is my first time i visit here and I found so many interesting stuff in your blog especially it's discussion, thank you. data science training in kanpur
ReplyDeleteI just found this blog and have high hopes for it to continue. Keep up the great work, its hard to find good ones. I have added to my favorites. Thank You.
ReplyDeletedata science classes in hyderabad
Really awesome blog and informative content. Thanks for sharing with us. If you want to become a data scientist, then check out the following link.
ReplyDeleteData Science Course with Placements in Hyderabad
Informative blog
ReplyDeletedata science training in jamshedpur
I really like reading a post that can make people think. Also, thank you for permitting me to comment!|data science training in jodhpur
ReplyDeleteseo fiyatları
ReplyDeletesaç ekimi
dedektör
instagram takipçi satın al
ankara evden eve nakliyat
fantezi iç giyim
sosyal medya yönetimi
mobil ödeme bozdurma
kripto para nasıl alınır
instagram beğeni satın al
ReplyDeleteyurtdışı kargo
seo fiyatları
saç ekimi
dedektör
fantazi iç giyim
sosyal medya yönetimi
farmasi üyelik
mobil ödeme bozdurma
MMORPG OYUNLAR
ReplyDeleteİNSTAGRAM TAKİPCİ SATIN AL
tiktok jeton hilesi
tiktok jeton hilesi
antalya saç ekimi
İNSTAGRAM TAKİPÇİ SATIN AL
instagram takipci satin al
METİN2 PVP SERVERLAR
Takipçi Satın Al
HRMS Software India
ReplyDeleteExcellent information you have shared, thanks for taking the time to share with us such a great article. I really appreciate your work.
FON PERDE MODELLERİ
ReplyDeletesms onay
Mobil ödeme bozdurma
Nft nasıl alınır
ankara evden eve nakliyat
trafik sigortası
dedektör
web sitesi kurma
aşk kitapları
Wonderful blog found to be very impressive to come across such an awesome blog. I should really appreciate the blogger for the efforts they have put in to develop such an amazing content for all the curious readers who are very keen of being updated across every corner. Ultimately, this is an awesome experience for the readers. Anyways, thanks a lot and keep sharing the content in future too.
ReplyDeleteCloud Telephony Software
Nice blog post so thanks a lot for sharing this great blog post.. keep more post for sharing.. have a nice day.
ReplyDeleteNotary Public Lawyer in Cambridge
smm panel
ReplyDeleteSmm Panel
İS İLANLARİ BLOG
İNSTAGRAM TAKİPÇİ SATIN AL
Hirdavatci Burada
beyazesyateknikservisi.com.tr
servis
Tiktok para hilesi indir
I like your blogs. Python is one of the most popular programming language. Python course in Greater Noida is the best place where you can start your career.
ReplyDelete