tag:blogger.com,1999:blog-15324198057018363862022-01-20T22:11:37.450-08:00Random ProblemsHere are solutions to some random problemstheboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.comBlogger145125tag:blogger.com,1999:blog-1532419805701836386.post-45270509865386997172021-11-10T23:10:00.008-08:002021-11-15T22:42:14.666-08:00Simple Tutorial for Hosting a CRUD Node App on AWS Elastic BeanstalkI had trouble finding (working) simple tutorials for running Node.js CRUD apps on AWS using Elastic Beanstalk so I wrote one from scratch and documented it.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-3NvjhOTIxvE/YYy8iQanNfI/AAAAAAAAIhU/OyeZiTd_HX80nLVGW7mLVQuEHoDpIlNOgCLcBGAsYHQ/s1804/initial%2Bworking%2Benvironment%2Bscreenshot.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="677" data-original-width="1804" height="auto" src="https://1.bp.blogspot.com/-3NvjhOTIxvE/YYy8iQanNfI/AAAAAAAAIhU/OyeZiTd_HX80nLVGW7mLVQuEHoDpIlNOgCLcBGAsYHQ/s1300/initial%2Bworking%2Benvironment%2Bscreenshot.PNG" width="0%" /></a></div><a name='more'></a><h4 style="text-align: left;">The app</h4><div>I've put two phases together. The <a href="https://github.com/rhamner/express_hello_world" target="_blank">first is a simple hello world</a> using Node.js, and the <a href="https://github.com/rhamner/simple_nodemysql_crud_app" target="_blank">second is a simple app</a> that lets you add a number to a MySQL database and read back all numbers that you've added. Below is what it looks like when it's working:</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-2gZXo7VY23c/YYy8YUfaqTI/AAAAAAAAIhM/HWZCpVCsIIQZhXRMrfmpgiTwuyCz2Gs2gCLcBGAsYHQ/s1500/working%2Bcrud%2Bpage.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="66" data-original-width="436" height="auto" src="https://1.bp.blogspot.com/-2gZXo7VY23c/YYy8YUfaqTI/AAAAAAAAIhM/HWZCpVCsIIQZhXRMrfmpgiTwuyCz2Gs2gCLcBGAsYHQ/s320/working%2Bcrud%2Bpage.PNG" width="35%" /></a></div><br /><div><br /></div><div>Please do not use this as a reference for writing Node cleanly or anything like that. This is structured to (hopefully) be very minimal and easy to follow example for setting up Node.js and MySQL through this service and isn't intended to be a serious app.<br /><br /><h4 style="text-align: left;">Setting up your AWS environment</h4></div><div>Set up an environment using the <a href="https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/GettingStarted.html" target="_blank">AWS documentation</a>. When writing this, the steps were:</div><div><ul style="text-align: left;"><li>create an environment</li></ul><ul style="text-align: left;"><li>choose 'Web server environment'</li></ul><ul style="text-align: left;"><li>configure environment</li><ul><li>give it a name</li><li>choose node js for type</li><li>use defaults for node and linux</li><li>choose 'upload your code' at the bottom</li><li>upload a zipped folder with the code from one of the examples I listed above; it should look like this (i.e., files directly in the zipped folder and not a folder containing them)</li></ul></ul><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-xw4vRABoN7Y/YYy8czfQ4TI/AAAAAAAAIhQ/W8F-2hbvZ78pva-7D8yTKnwferqQYw2TACLcBGAsYHQ/s1172/zipped%2Bfolder.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="281" data-original-width="1172" height="auto" src="https://1.bp.blogspot.com/-xw4vRABoN7Y/YYy8czfQ4TI/AAAAAAAAIhQ/W8F-2hbvZ78pva-7D8yTKnwferqQYw2TACLcBGAsYHQ/s1500/zipped%2Bfolder.PNG" width="70%" /></a></div><div><br /></div><ul style="text-align: left;"><ul><li>configure more options</li><ul><li>create a database and choose mysql</li></ul></ul></ul><ul style="text-align: left;"><li>create</li></ul><ul style="text-align: left;"><li>wait 10 minutes or so</li></ul><div>These might change though so the AWS documentation is likely a good place to start for initial environment creation.</div></div><div><br /></div><div>When complete, it should look something like this:</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-3NvjhOTIxvE/YYy8iQanNfI/AAAAAAAAIhU/OyeZiTd_HX80nLVGW7mLVQuEHoDpIlNOgCLcBGAsYHQ/s1804/initial%2Bworking%2Benvironment%2Bscreenshot.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="677" data-original-width="1804" height="auto" src="https://1.bp.blogspot.com/-3NvjhOTIxvE/YYy8iQanNfI/AAAAAAAAIhU/OyeZiTd_HX80nLVGW7mLVQuEHoDpIlNOgCLcBGAsYHQ/s1300/initial%2Bworking%2Benvironment%2Bscreenshot.PNG" width="70%" /></a></div><div><br /></div><div><br /></div><div><h4 style="text-align: left;">Quick exploration of the environment</h4><div>Two things that you might want to do right away are manage versions of your app, and look at logs. <br /><br />To manage versions, simply go to 'Application versions' on the left and you can deploy, delete, etc. any of them:</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-RurjlL1g_bE/YYy8mvrUVoI/AAAAAAAAIhY/YSClAZrFr8czHo5YmzS8-1CzMu90DCWXgCLcBGAsYHQ/s1246/application%2Bversions%2Bimage.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="261" data-original-width="1246" height="67" src="https://1.bp.blogspot.com/-RurjlL1g_bE/YYy8mvrUVoI/AAAAAAAAIhY/YSClAZrFr8czHo5YmzS8-1CzMu90DCWXgCLcBGAsYHQ/s1300/application%2Bversions%2Bimage.PNG" width="70%" /></a></div><div><br /><br />To look at logs, simply go to 'Logs' on the left and you can pull them.</div><div><br /></div><h4 style="text-align: left;">Quick overview of the hello world app</h4><div><i><br /></i></div><div><i>package file:</i></div><div><br /></div><div>The package file for this is very simple. It just defines express v3.1.0 as a dependency and tells the environment that 'node app.js' is the starting script to run.</div><div><br /></div><div><i>app:</i></div><div><i><br /></i></div><div>Things worth noting here:</div><div><br /></div><div><ul style="text-align: left;"><li>AWS manages the ports for you; 'const port = process.env.PORT || 3000;' says 'use port 3000 unless the host has configured a port for you'</li></ul><ul style="text-align: left;"><li>'const dir = `${__dirname}/public/`;' grabs the public folder for the app and puts it in dir</li></ul><ul style="text-align: left;"><li>'app.get("/", (req, res) => { res.sendfile(dir + "index.html"); });' serves up the index.html file in the public folder when the default url is visited</li></ul></div><div>That's it.</div><div><br /></div><h4 style="text-align: left;">What the CRUD app adds to the hello world example</h4><div><i><br /></i></div><div><i>package file:</i></div><div><i><br /></i></div><div>This app needs to use mysql and there's a convenient async package for smoothing out async usage in node that I used here.</div><div><br /></div><div><i>app:</i></div><div><i><br /></i></div><div>There are a few new things here:<br /><ul style="text-align: left;"><li>database configuration</li><ul><li>AWS manages the db so generate a connection using the environment variables for it</li><li>in order</li><ul><li>create the db if it doesn't exist</li><li>set it as the db to use</li><li>create the 'numbers' table if it doesn't exist</li></ul><li>log error or success message depending on how that sequence of calls went</li></ul></ul><ul style="text-align: left;"><li>route configuration</li><ul><li>the app has two endpoints</li><ul><li>history: return all numbers entered so far</li><li>new: add a new number</li></ul></ul></ul></div></div><div><i>route:</i></div><div><i><br /></i></div><div>The new endpoints are just SQL queries and those are implemented in 'numbers.js' in the routes folder.</div><div><br /></div><div><i>index.html</i></div><div><i><br /></i></div><div>This is a basic numeric input and button. Page load gets the history of numbers entered. Clicking the button adds the number in the input then gets the history.</div><div><br /></div><div>Basic summary then is:<br /><ul style="text-align: left;"><li>navigating to the page returns index.html</li></ul><ul style="text-align: left;"><li>index.html auto-calls the history and displays a formatted response</li><ul><li>calling history actually calls the 'getNumbers' method in numbers.js</li></ul></ul><ul style="text-align: left;"><li>submitting a new number calls new and then history</li><ul><li>calling new actually calls the 'addNumber' method in numbers.js</li></ul></ul><h4 style="text-align: left;">Summary</h4></div><div>That's all that's required to host a simple Node.js + MySQL app on AWS Elastic Beanstalk. Hopefully nothing has changed significantly by the time you read this that breaks this tutorial as that happening to a couple of others I read through is what prompted me to write this. Feel free to comment if you hit issues.</div><div><br /></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com2tag:blogger.com,1999:blog-1532419805701836386.post-16993490142652990452021-10-24T21:43:00.003-07:002021-10-24T21:43:23.572-07:00How Do You Subtract Binary Numbers?What is the value of something like 101001 - 1101?<img height="auto" src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f5/Oudjat.SVG/240px-Oudjat.SVG.png" width="0%" /><a name='more'></a><p>It's probably easiest to understand this by first going through subtraction of normal (base 10) numbers. What is 119 - 35? You do the following:<br /></p><ul style="text-align: left;"><li>start with 1's digit; 9 - 5 = 4</li><li>move to 10's digit; 1 - 3 = -2; negative numbers make this hard, so 'borrow' 10 from the 100's digit; now the 10's digit is 11, so 11 - 3 = 8</li><li>100's digit lost 1 in the top number in the borrowing, so it's now 0; bottom number's 100's digit is 0 also</li></ul><div>So you have 0 in the 100's place, 8 in the 10's, and 4 in the 1's, so 84 is the result.</div><div><br /></div><div>Subtracting binary numbers works identically, except instead of borrowing 10 and using powers of 10 (1's, 10's, 100's, etc.) as places, you use powers of 2 (1's, 2's, 4's, etc.).</div><div><br /></div><div>Starting with an easy one: 10 - 1 (in binary):</div><div><ul style="text-align: left;"><li>start with 1's digit; 0 - 1 = -1; negative numbers make this hard, so 'borrow' 2 from the 2's digit; now the 1's digit is 2 - 1 = 1</li><li>move to 2's digit; it lost 1 in the borrowing, so it's now 0; bottom number's 100's digit is 0 also</li></ul><div>So you have 0 in the 2's place, and 1 in the 1's place, so the answer is 1. Double-checking, 10 in binary is 2, and 1 is 1, so 10 - 1 in binary is the same as 2 - 1 in normal (base 10) which is obviously 1.</div></div><div><br /></div><div>That's it. It works the exact same way as base 10 subtraction except that instead of borrowing 10, you borrow 2.</div><div><br /></div><div>Now for the original problem: 101001 - 1101</div><div><ul style="text-align: left;"><li>1's digit is 1 - 1 = 0</li><li>2's is 0 - 0 = 0</li><li>4's is 0 - 1; borrow from the 8's to get 2 - 1 = 1</li><li>8's had the borrow on the top, so it's now 0 - 1; borrow from the 16's to get 2 - 1 = 1</li><li>16's had the borrow on the top, so now it's -1 - 0; borrow from the 32's to get 1 - 0 = 1</li><li>32's had the borrow on the top, so now it's 0 - 0 = 0</li></ul><div>So the result is just 11100. Converting that to base 10, that's 28. Checking by converting the original problem, that was 41 - 13, and the answer is also 28.</div></div><p></p>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-48949351483822666052021-10-14T22:23:00.009-07:002021-10-14T22:24:34.482-07:00How Do I Determine My Raise Given Inflation?If you get a 10% raise, and inflation is 6%, did you actually get a raise?<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-6jD5n5ZJkcI/YWkQNZhT6uI/AAAAAAAAIcg/9Lz2R7kzYVAOH_8yFS2vf-04Hu66X068QCLcBGAsYHQ/s896/raise%2Bequation.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="157" data-original-width="896" height="auto" src="https://1.bp.blogspot.com/-6jD5n5ZJkcI/YWkQNZhT6uI/AAAAAAAAIcg/9Lz2R7kzYVAOH_8yFS2vf-04Hu66X068QCLcBGAsYHQ/s1600/raise%2Bequation.PNG" width="0%" /></a></div><a name='more'></a><br /><p></p><p>To get it out of the way, your actual raise is given by:<br /><br /></p><script async="" id="MathJax-script" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script><style>.MathJax { font-size: 1.3em !important; }</style><div style="font-size: 24px;"> \[{\text{actual raise = }\frac{\text{new salary}}{\text{old salary * (1 + inflation rate)}}} - 1\]</div><br /><div>Where does this come from? It's maybe easiest to think of this in terms of units. Say you make $50,000 now, and you made $40,000 last year. You make 20% more right? Not exactly. What the $ there really represents is some purchasing power. Inflation is a drop in purchasing power, so what you really need to do is convert the $ before and after to the same unit. To determine the value of $ in the current year in terms of the $ in the previous year, you just divide it by 1 + inflation rate. That gives you the equation above.</div><div><br /></div><div>Plugging in the numbers in the initial question then, the actual raise is:<br /><br /><br /></div><div style="font-size: 24px;"> \[{\frac{\text{new salary}}{\text{old salary * (1 + inflation rate)}}} - 1\]</div><br /><div style="font-size: 24px;"> \[{\frac{\text{old salary * (1 + 0.10)}}{\text{old salary * (1 + 0.06)}}} - 1\]</div><br /><div>Which is just 0.038, so the actual raise is 3.8%.</div><div><br /></div><div>It is very important to understand your raise in terms of local inflation. If you get a 5% raise but your area gets 10% more expensive, you actually got a paycut (4.5% paycut given those numbers).</div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-4511167942799153352021-10-02T22:32:00.003-07:002021-10-04T22:01:50.933-07:00Can You Confirm Performance Improvements With Noisy Software Benchmarks?Say you run 20 tests before and after a code change meant to speed up the code, but there's a lot of noise in your benchmarks. Some simple statistical tests can help you determine if you actually have an improvement in that noise.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-KlEDPMOY9-Q/YVk-7jvvSwI/AAAAAAAAIbA/WfAFueWg66EO-mhoFzZ6ugYhfjLMmmeugCLcBGAsYHQ/s1346/benchmark%2Bsample%2Btimes.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="510" data-original-width="1346" height="auto" src="https://1.bp.blogspot.com/-KlEDPMOY9-Q/YVk-7jvvSwI/AAAAAAAAIbA/WfAFueWg66EO-mhoFzZ6ugYhfjLMmmeugCLcBGAsYHQ/s1600/benchmark%2Bsample%2Btimes.PNG" width="0%" /></a></div><a name='more'></a><h4 style="text-align: left;">Sample Data</h4><div style="text-align: left;">Imagine your 20 runs before and after look like this:</div><div style="text-align: left;"><br /></div><div><table style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr style="background-color: #e6a117;"><th>Before (ms)</th><th>After (ms)</th></tr><tr><td>241</td><td>272</td></tr><tr><td>224</td><td>211</td></tr><tr><td>202</td><td>226</td></tr><tr><td>243</td><td>234</td></tr><tr><td>246</td><td>205</td></tr><tr><td>229</td><td>279</td></tr><tr><td>209</td><td>208</td></tr><tr><td>231</td><td>212</td></tr><tr><td>258</td><td>218</td></tr><tr><td>287</td><td>198</td></tr><tr><td>270</td><td>215</td></tr><tr><td>262</td><td>244</td></tr><tr><td>227</td><td>215</td></tr><tr><td>200</td><td>175</td></tr><tr><td>291</td><td>220</td></tr><tr><td>290</td><td>218</td></tr><tr><td>184</td><td>218</td></tr><tr><td>319</td><td>247</td></tr><tr><td>250</td><td>245</td></tr><tr><td>229</td><td>199</td></tr></tbody></table><br />In case you prefer histograms:</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-KlEDPMOY9-Q/YVk-7jvvSwI/AAAAAAAAIbA/WfAFueWg66EO-mhoFzZ6ugYhfjLMmmeugCLcBGAsYHQ/s1346/benchmark%2Bsample%2Btimes.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="510" data-original-width="1346" height="auto" src="https://1.bp.blogspot.com/-KlEDPMOY9-Q/YVk-7jvvSwI/AAAAAAAAIbA/WfAFueWg66EO-mhoFzZ6ugYhfjLMmmeugCLcBGAsYHQ/s1600/benchmark%2Bsample%2Btimes.PNG" width="90%" /></a></div><br /><div><br /><br /></div><div>The 'after' numbers look like they're maybe smaller. If you take the average you get 245 ms before and 223 ms after. Is that really better though or are you just seeing noise?</div><div><br /></div><h4 style="text-align: left;">T-Test</h4><div>Assuming your benchmarking noise is roughly normally distributed, you can use a <a href="https://en.wikipedia.org/wiki/Student%27s_t-test" rel="" target="_blank">T-Test</a>. If you have never seen a T-test, a really rough description is that it will take two groups of numbers, and tell you if the means of the two groups are significantly different (i.e., the difference between them probably isn't just noise). </div><div><br /></div><div>What does 'probably' mean here? You get a p value out of T-Tests that is the probability that they're the same. E.g., a p value of 0.05 would mean roughly 'there's a 5% chance that the ~20 ms difference here is just noise'. </div><div><br /></div><div>You can do this in excel, google sheets, any of the many websites that do it, etc. I tend to use Python for this sort of stuff so a simple overview of how to do it in Python is:</div><div><ul style="text-align: left;"><li>import stats from scipy</li><li>call the <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html" target="_blank">ttest_ind</a> method in it with the before numbers as the first arg and the after numbers as the second</li><li>the t value returned should be positive (since before should be higher than after) and the p value should be 2*target probability</li></ul><div>For the numbers in the example here, I get a p value of 0.03 which is less than the common target of 0.05, and recall earlier that I noted it's 2*target probability, so this is effectively a probability of 1.5% (p value of 0.015) which would generally mean 'significant difference'. Note that 'significant' here doesn't mean important...just unlikely to be noise. The difference in means is still the primary metric here. </div><div><br /></div><div>To summarize this then, you could say that the update significantly altered the benchmark time, and the difference in means is ~20 ms (or a ~10% performance improvement).</div><div><br /></div><div><i>Why divide by 2?</i></div><div><i><br /></i></div><div>This is an artefact of the method you use. In this case, the method I gave for testing this tests both sides of the assumption (i.e., tests both before > after and before < after). We only care about the before > after side though. This method actually handles this for you in current versions but I have an older version installed and wanted to put the more generic.</div><div><br /></div><div><i>Why ttest_ind?</i></div><div><i><br /></i></div><div>There are a lot of variants of T-Tests you can run. It's worth reading through them but I won't rewrite tons of info on them here. The ttest_ind I used is for independent samples of data. You might argue that a paired one is better here since 'making code faster' is sort of changing one aspect of a thing and testing it again, but ttest_ind works well in general usage.</div><div><br /></div><h4 style="text-align: left;">Mann-Whitney</h4><div>What if you have outliers and/or do not have a normal distribution of noise in your benchmarks? For a concrete example, what if the first number in the 'after' column is 600 instead of 272? T-Tests are not valid in these situations. Running it blindly returns a p of 0.4 which would indicate not significantly different, all from that single bad outlier.</div><div><br /></div><div>You can auto-exclude best and worst n times. You can manually scrub data. That sounds really manual though and we want to automate things. You can also use another type of test. One that's useful here is the <a href="https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test" target="_blank">Mann-Whitney U test</a>.</div><div><br /></div><div>The results are similar to a T-Test but the test itself is looking for something slightly different. Roughly, this test tells you how likely it is that the results are such that a random value chosen from after is just as likely to be greater than a random value chosen from before as vice-versa. Since it doesn't care about the magnitudes (only the orders), it is fine for outliers and non-normally distributed data.</div><div><br /></div><div>Same basic flow in Python:</div><div><ul><li>import stats from scipy</li><li>call the <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html" target="_blank">mannwhitneyu</a> method in it with the before numbers as the first arg and the after numbers as the second; also pass in 'two-sided' as the alternative to be consistent with the T-Test above if you want</li><li>the p value should be 2*target probability</li></ul><div>With the numbers here, I get a p value of 0.04, so dividing by 2, 0.02. This test was not tripped up by the outlier.</div></div><div><br /></div></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-26210373466399322642021-09-13T22:18:00.014-07:002021-09-13T23:35:59.987-07:00Why Do We Multiply the Way We Do?We could just repeatedly add the numbers but we don't. Is the algorithm we use actually faster?<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-QUPyNJPO7OU/YUAxugpCidI/AAAAAAAAIZM/sT2-efUMlckFqN9yOFuXvVwTOJZm_2MaQCLcBGAsYHQ/s1353/multiplication%2Balgorithm%2Bcompare.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="772" data-original-width="1353" height="auto" src="https://1.bp.blogspot.com/-QUPyNJPO7OU/YUAxugpCidI/AAAAAAAAIZM/sT2-efUMlckFqN9yOFuXvVwTOJZm_2MaQCLcBGAsYHQ/s1600/multiplication%2Balgorithm%2Bcompare.PNG" width="0%" /></a></div><a name='more'></a><p>What I'm talking about here is multiplying 219 x 87 like the following:</p><p></p><ul style="text-align: left;"><li>7x9 to get 63</li><li>7x1 to get 7 and add a 0 to get 70</li><li>7x2 to get 14 and add two 0's to get 1400</li><li>8x9 to get 72 and add a 0 to get 720</li><li>8x1 to get 8 and add two 0's to get 800</li><li>8x2 to get 16 and add three 0's to get 16000</li><li>add all those together to get 19,053</li></ul><div>That's 6 simple multiplications and 6 additions. If we just added 219 to itself 87 times, that's 87 operations so clearly more steps with one big assumption: </div><div><br /></div><div style="text-align: center;"><i>you've memorized m x n for all integers m and n from 2 to 10. </i></div><div><br /></div><div>This is why we all had to learn times tables. How does this generalize as an algorithm?</div><div><br /></div><div>Repeated addition is ~ operation per 'smallest of the two numbers', so that is just O(n) where n is the smaller of the numbers.</div><div><br /></div><div>The algorithm we actually use is a bit harder. It scales as a x b where a and b are the number of digits in the two numbers. How does 'number of digits' scale? That's O(logn) where n is the number. Since it scales as the product of those, that algorithm scales as O(logm * logn) where m and n are the two numbers. </div><div><br /></div><div>What about the memorized simple multiplications? I have no idea how our memory access scales, but I'm going to just guess it's a constant time operation for simple multiplication so O(1) which doesn't contribute.</div><div><br /></div><div>For an example with actual calculations, here is the cost of multiplying each number up to 99 by 99 using each algorithm:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-QUPyNJPO7OU/YUAxugpCidI/AAAAAAAAIZM/sT2-efUMlckFqN9yOFuXvVwTOJZm_2MaQCLcBGAsYHQ/s1353/multiplication%2Balgorithm%2Bcompare.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="772" data-original-width="1353" height="auto" src="https://1.bp.blogspot.com/-QUPyNJPO7OU/YUAxugpCidI/AAAAAAAAIZM/sT2-efUMlckFqN9yOFuXvVwTOJZm_2MaQCLcBGAsYHQ/s1600/multiplication%2Balgorithm%2Bcompare.PNG" width="90%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><br /></div><div><br /></div>It might not be obvious that O(logm * logn) is faster than O(n) but with actual numbers in the plot there it becomes pretty clear.</div><div><br /></div><div>It's cool to me that a basic math thing we all learn when we're little kids effectively uses a <a href="https://en.wikipedia.org/wiki/Dynamic_programming" target="_blank">dynamic programming</a> algorithm (memorize all m x n for m and n up to 10; convert every multiplication problem into a combination of m x n problems that you already solved).<br /><br /></div><p></p>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-68750193908089284782021-08-31T21:43:00.000-07:002021-08-31T21:43:01.891-07:00Exploring Senior Software Engineer Salary Data in levels.fyi<a href="https://www.levels.fyi/">levels.fyi</a> is a great resource for software salary info and it's easily mineable. I was curious how salaries in what are sometimes considered medium cost-of-living cities compare.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-W4Egjoa1P0c/YS8Cf6kM-bI/AAAAAAAAIXY/Skc74bmHi14z4odUGE4Yul9nDtVVQStIQCLcBGAsYHQ/s1074/senior%2Bsw%2Bby%2Bcity.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="673" data-original-width="1074" height="auto" src="https://1.bp.blogspot.com/-W4Egjoa1P0c/YS8Cf6kM-bI/AAAAAAAAIXY/Skc74bmHi14z4odUGE4Yul9nDtVVQStIQCLcBGAsYHQ/s1600/senior%2Bsw%2Bby%2Bcity.png" width="0%" /></a></div><a name='more'></a><p>Software careers often have levels (hence the site name). Typically there's entry with 0-2.5 years, next at 2-5 years, then career level at 5-10 years. Some go above that (principal, chief, etc.). The one I'll play with here is the 5-10 year one. 5-10 year is often called 'senior software engineer'.</p><p>Here are the rough pay distributions in levels for that experience range in mid-priced cities (this is total compensation and not base salary):</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-W4Egjoa1P0c/YS8Cf6kM-bI/AAAAAAAAIXY/Skc74bmHi14z4odUGE4Yul9nDtVVQStIQCLcBGAsYHQ/s1074/senior%2Bsw%2Bby%2Bcity.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="673" data-original-width="1074" height="auto" src="https://1.bp.blogspot.com/-W4Egjoa1P0c/YS8Cf6kM-bI/AAAAAAAAIXY/Skc74bmHi14z4odUGE4Yul9nDtVVQStIQCLcBGAsYHQ/s1600/senior%2Bsw%2Bby%2Bcity.png" width="95%" /></a></div><p>A comparison is hard because it's not clear that each city represents the same thing. For example, many are state capitals so if 90% of the jobs are state ones then you'd expect them to be lower. Here's a list of the top-3 included employers for each city in that plot to hopefully provide more context:</p><p></p><ul style="text-align: left;"><li>Pittsburgh: Google, Uber, Argo AI<br /><br /></li><li>Chicago: Paypal, Expedia, Accenture<br /><br /></li><li>Denver: Amazon, Deloitte, Gusto<br /><br /></li><li>Austin: Apple, IBM, Amazon<br /><br /></li><li>Detroit: Amazon, General Motors, Quicken Loans<br /><br /></li><li>Atlanta: VMWare, Salesforce, Microsoft<br /><br /></li><li>Raleigh: IBM, Cisco, Microsoft<br /><br /></li><li>Nashville: Amazon, Asurion, HCA Healthcare<br /><br /></li><li>Phoenix: American Express, Intel, Amazon</li></ul><p></p><p>Amazon is everywhere apparently...</p><p>These numbers aren't perfect obviously. Many people do work for the state for example and they don't seem to be providing salaries here, so I'd wager that levels.fyi is biased towards higher-paying companies. Fun data though.</p><p><br /></p>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-72554932798682197152021-08-01T21:26:00.002-07:002021-08-01T21:26:53.364-07:00How to Add a Vertical Scrollbar to PlotlyPlotly doesn't have the built-in ability to scroll vertically with a fixed x axis unfortunately, but you can mimic that fairly easily...<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Cbc0MUCU-Tg/YQdzbFfYNGI/AAAAAAAAIRU/quNF02sAXak4YbGwwJk-nOVWE28ckUPUQCLcBGAsYHQ/s817/scrolling%2Bplotly.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="817" height="auto" src="https://1.bp.blogspot.com/-Cbc0MUCU-Tg/YQdzbFfYNGI/AAAAAAAAIRU/quNF02sAXak4YbGwwJk-nOVWE28ckUPUQCLcBGAsYHQ/s1600/scrolling%2Bplotly.PNG" width="0" /></a></div><a name='more'></a><p>First, here's the demo:</p><p class="codepen" data-height="450" data-slug-hash="dyWeQJp" data-user="rhamner" style="align-items: center; border: 2px solid; box-sizing: border-box; display: flex; height: 450px; justify-content: center; margin: 1em 0px; padding: 1em;"> <span>See the Pen <a href="https://codepen.io/rhamner/pen/dyWeQJp"> vertical scroll plotly</a> by Robert Hamner (<a href="https://codepen.io/rhamner">@rhamner</a>) on <a href="https://codepen.io">CodePen</a>.</span></p><script async="" src="https://cpwebassets.codepen.io/assets/embed/ei.js"></script><p><br /></p><p>The basic model here is two stack two plots directly on top of each other where top is in a scrollable div and bottom is not.</p><p></p><ul style="text-align: left;"><li>Make two divs</li><ul><li>plot div</li><ul><li>scrollable</li><li>width = plot width + scroll width</li></ul><li>xaxis div</li><ul><li>not scrollable</li><li>width = plot width</li></ul></ul><li>Make two plots</li><ul><li>plot</li><ul><li>goes in plot div</li><li>y-axis zeroline is hidden</li><li>bottom margin is 0</li></ul><li>xaxis</li><ul><li>0 top margin</li><li>hide the modebar</li></ul></ul><li>Make the plot xaxis ranges equal</li></ul><div>You can then get as complicated as you need to here. I added really crude layout event linking to the demo...I'm hitting a weird double-click bug (should autoscale but isn't) but this works pretty easily/cleanly as the basic concept.</div><div><br /></div><br /><div><br /></div><p></p>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-65129511760470397342021-06-30T21:52:00.004-07:002021-06-30T21:54:43.698-07:00If 10 Vaccinated and 10 Unvaccinated People Die, Can We Still Say Vaccines Work?You will almost certainly be seeing headlines about vaccinated people dying and might even see that more vaccinated than unvaccinated die. <a href="https://www.wsj.com/articles/covid-19-killed-26-indonesian-doctors-in-juneat-least-10-had-taken-chinas-sinovac-vaccine-11624769885">Here's one from the week that I wrote this post</a>. Why do we still say vaccines work if this is happening?<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-nQcC4Laplpc/YN1JdXl-LpI/AAAAAAAAIKU/zJo261ZsHWIUQpsmKENkM-Nk5I2HQ5HZACLcBGAsYHQ/s1258/distribution.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="663" data-original-width="1258" src="https://1.bp.blogspot.com/-nQcC4Laplpc/YN1JdXl-LpI/AAAAAAAAIKU/zJo261ZsHWIUQpsmKENkM-Nk5I2HQ5HZACLcBGAsYHQ/s1600/distribution.png" width="0%" /></a></div><a name='more'></a>Imagine as an example that you see '10 vaccinated and 10 unvaccinated doctors died from COVID-19 today'. Your brain probably thinks 'well...the vaccine didn't work I guess.' We see those numbers, then just assume that the populations were similar. They're all doctors right?<p>Digging more, say that it turns out that 90% of the doctors were vaccinated. To make it easy, assume that there are 1,000 total doctors. 90% vaccinated means there were 900 vaccinated and 100 unvaccinated. If 10 died from each group, that means:<br /></p><ul style="text-align: left;"><li>10 / 100, or 10% of unvaccinated doctors died</li><li>10 / 900, or 1.1% of vaccinated doctors died</li></ul><div>Unvaccinated doctors were 9 times as likely to die as vaccinated ones. Another way of phrasing that is that the vaccine's efficacy was:<br /><br /><div style="text-align: center;"><b>vaccine efficacy</b> = 1 - (vaccinated risk/unvaccinated risk) = 1 - (0.011/0.1) = <b>89%</b></div></div><div style="text-align: center;"><b><br /></b></div><div style="text-align: left;">This is how you have to think about things like this. Vaccines, masks, seat belts, helmets, etc. aren't 100% effective. Use the calculation above whenever you see headlines like this and want to know the actual story. </div><div style="text-align: left;"><br /></div><div style="text-align: left;">You can even have more vaccinated deaths than unvaccinated. Imagine for the 89% efficacy vaccine above, you have 99% of the population vaccinated. For 10,000 doctors in that example, you'd expect to have 10% of the 100 unvaccinated die and 1.1% of the vaccinated 9900 die, so that's 10 unvaccinated deaths and 120 vaccinated deaths. <b>A highly effective vaccine can still have more vaccinated people die than unvaccinated ones.</b></div><div style="text-align: left;"><br /></div><div style="text-align: left;">In case a visual helps, here is the initial example's distribution as a colored grid (red = dead and green = alive):</div><div style="text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-nQcC4Laplpc/YN1JdXl-LpI/AAAAAAAAIKU/zJo261ZsHWIUQpsmKENkM-Nk5I2HQ5HZACLcBGAsYHQ/s1258/distribution.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="663" data-original-width="1258" src="https://1.bp.blogspot.com/-nQcC4Laplpc/YN1JdXl-LpI/AAAAAAAAIKU/zJo261ZsHWIUQpsmKENkM-Nk5I2HQ5HZACLcBGAsYHQ/s1600/distribution.png" width="80%" /></a></div><br /><div style="text-align: left;"><br /></div><p></p>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-71486582668205143522021-06-13T21:56:00.005-07:002021-06-14T07:51:13.203-07:00Wheel Options Strategy SimulationsThe 'Wheel' is an options strategy that combines cash-secured puts with covered calls. I sometimes have trouble really grasping options strategies in my head, so simulating some scenarios gives me a better feel.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-ApkhSBA5hE8/YMbf7ow1-zI/AAAAAAAAIFc/3t4S6KCOg8U6DyQ5sHG_23htg3HJoc9QgCLcBGAsYHQ/s917/simulations.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="643" data-original-width="917" src="https://1.bp.blogspot.com/-ApkhSBA5hE8/YMbf7ow1-zI/AAAAAAAAIFc/3t4S6KCOg8U6DyQ5sHG_23htg3HJoc9QgCLcBGAsYHQ/s1600/simulations.png" width="0%" /></a></div><a name='more'></a><h4 style="text-align: left;">Basic strategy</h4>To keep it simple, I will just deal with 'at the money' (ATM) options here. The basic strategy then is:<br /><ol style="text-align: left;"><li>Sell a cash-secured put to start<br /><br /></li><li>If the stock goes up, the put expires so you sell another put<br /><br /></li><li>If the stock goes down, the put is exercised, you're assigned the shares, so you can sell a covered call<br /><br /></li><li>If the stock goes down from there, the call expires so you sell another call<br /><br /></li><li>If the stock goes up from there, the call is exercised, you sell the shares, so you sell another put<br /><br /></li><li>repeat...</li></ol><div>You can see that when you are assigned shares, you just sell a call, and when a call is exercised, you just use the cash to sell a put. This repeats indefinitely. Isn't this just free money? Sort of...what you're trading off here is a bit hard to see immediately. This is where playing with some numbers can make it easier to understand what's happening.</div><div><br /></div><h4 style="text-align: left;">Simple examples</h4><div>To get a better idea of how this works, let's look at 3 simple examples:<br /><ol style="text-align: left;"><li>stock doesn't change much<br /><br /></li><li>stock drops ~15% in a year<br /><br /></li><li>stock gains ~15% in a year<br /><br /></li><li>stock crashes ~15% and rebounds in a year</li></ol><div>In each scenario, I'll add a bit of noise and assume that selling a put yields $1.25/month, selling a call yields $1/month, and these are all monthly expirations and ATM strikes with a starting value of $100.</div></div><div><br /></div><div>Imagine in the first one the price for the first 5 months is 100, 102, 101, 97, 101. What does the wheel strategy look like?<br /><ul style="text-align: left;"><li>sell put for $1.25 with a $100 strike; gain $1.25 from the sell and lose nothing in stock</li></ul><ul style="text-align: left;"><li>price hits $102; that's above the $100 strike so it expires; sell another $1.25 put with a $102 strike and lose nothing in stock</li></ul><ul style="text-align: left;"><li>price hits $101; that's below the $102 strike so pay $102 for the shares and sell a $1 call with a $101 strike</li></ul><ul style="text-align: left;"><li>price hits $97; that's below the $101 strike so it expires; sell another $1 call with a $97 strike</li></ul><ul style="text-align: left;"><li>price hits $101; that's above the $97 strike so sell at $97 and sell another $1.25 put with a $101 strike</li></ul><div>Overall, the wheel earned $5.75 from selling options, but lost $4 in the stock (bought at $101 and sold at $97). That stock loss is the most obvious loss here but there's another more subtle one. Look at the first put again. The gain from selling the option was $1.25, but the stock itself gained $2 then. The gain was effectively capped at $1.25. The same is not true for the loss. When the stock fell, the entire loss was absorbed. Capping gains while having to absorb losses is a primary tradeoff here (some other ones are poor tax performance, low-liquidity, and potentially missing dividends).</div></div><div><br /></div><div>Now that that's understood, it's helpful to me to see this graphically, so here are sample runs of the examples from above:</div><div><br /><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-_HgM81YeDRg/YMbeZli8AOI/AAAAAAAAIFE/883L9T_B780lbXOkj4mpUNRcsMne-fBDgCLcBGAsYHQ/s898/flat%2Bstock.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="458" data-original-width="898" src="https://1.bp.blogspot.com/-_HgM81YeDRg/YMbeZli8AOI/AAAAAAAAIFE/883L9T_B780lbXOkj4mpUNRcsMne-fBDgCLcBGAsYHQ/s1600/flat%2Bstock.PNG" width="75%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-LYLH2XYCFGs/YMbeesunGtI/AAAAAAAAIFI/5-HbWJB6-hMkiciIvkxB91bV45h5Q39egCLcBGAsYHQ/s899/declining%2Bstock.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="460" data-original-width="899" src="https://1.bp.blogspot.com/-LYLH2XYCFGs/YMbeesunGtI/AAAAAAAAIFI/5-HbWJB6-hMkiciIvkxB91bV45h5Q39egCLcBGAsYHQ/s1600/declining%2Bstock.PNG" width="75%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-ZSlEpLiM5LA/YMbehokTh3I/AAAAAAAAIFM/R7angX0QZhEKE_5KgSJxESZa52nAluSSwCLcBGAsYHQ/s898/growing%2Bstock.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="461" data-original-width="898" src="https://1.bp.blogspot.com/-ZSlEpLiM5LA/YMbehokTh3I/AAAAAAAAIFM/R7angX0QZhEKE_5KgSJxESZa52nAluSSwCLcBGAsYHQ/s1600/growing%2Bstock.PNG" width="75%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-rc92adMRhes/YMbekDgSbgI/AAAAAAAAIFQ/MZmy6tdr-TEBnSZqPllf6U4-wC9IxkiqgCLcBGAsYHQ/s897/crash%2Band%2Brebound.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="454" data-original-width="897" src="https://1.bp.blogspot.com/-rc92adMRhes/YMbekDgSbgI/AAAAAAAAIFQ/MZmy6tdr-TEBnSZqPllf6U4-wC9IxkiqgCLcBGAsYHQ/s1600/crash%2Band%2Brebound.PNG" width="75%" /></a></div><br /><div><br /></div><div>The general behavior here is that the wheel smooths out the plots a bit. Increases and decreases aren't quite as big. You can control how smooth it is by changing expiration dates and strike price offsets (e.g., selling calls with a strike price 10% above current price will allow for larger gains but give you less option premium, so the performance looks more like buy and hold). When a stock crashes, you'll probably do a bit better with the wheel. When a stock surges, you'll probably do a bit worse with the wheel.</div><div><br /></div><div>The above plots are single trials of the simulation. What does it look like if this is run thousands of times?</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-ApkhSBA5hE8/YMbf7ow1-zI/AAAAAAAAIFc/3t4S6KCOg8U6DyQ5sHG_23htg3HJoc9QgCLcBGAsYHQ/s917/simulations.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="643" data-original-width="917" src="https://1.bp.blogspot.com/-ApkhSBA5hE8/YMbf7ow1-zI/AAAAAAAAIFc/3t4S6KCOg8U6DyQ5sHG_23htg3HJoc9QgCLcBGAsYHQ/s1600/simulations.png" width="75%" /></a></div><br /><div>That's much clearer to me. In down periods, the wheel just minimizes your losses a bit (loss is stock loss, but you gain option premium). In good periods, the wheel caps your gains so you get a flattened distribution (max gain is the option premium).</div><div><br /></div><h4 style="text-align: left;">Summary</h4><div>Should you use this strategy? There's no perfect answer for that. In an extremely long bull market (like current), it's likely going to underperform. It does give you a bit of protection against drops and can do better in neutral markets. I personally don't like the thought of capping gains while not capping losses so this isn't a favorite of mine (see the fourth image with the crash and rebound to understand why that can be bad), but it's definitely viable if you want to smooth out your returns a bit.</div><div><br /></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com3tag:blogger.com,1999:blog-1532419805701836386.post-10687219980270690432021-05-21T20:15:00.004-07:002021-05-21T20:18:12.028-07:00Negative values with a log axis in PlotlyAlthough log10(<any number less than or equal to 0>) is not defined, there are situations where you want to visualize data as if it were. How can you get plotly to do that? Another way of asking is 'how can you mimic symlog functionality in plotly?'<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Rkm-hJMngXg/YKh2_h6M6WI/AAAAAAAAH-8/UubKGrQfXTEOdmunCHlRPofa3tl_tBAqQCLcBGAsYHQ/s1862/symlog.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="348" data-original-width="1862" src="https://1.bp.blogspot.com/-Rkm-hJMngXg/YKh2_h6M6WI/AAAAAAAAH-8/UubKGrQfXTEOdmunCHlRPofa3tl_tBAqQCLcBGAsYHQ/s1600/symlog.PNG" width="0%" /></a></div><a name='more'></a>First...a real example of when you'd want this. Imagine you do the following:<div><ul style="text-align: left;"><li>generate a 1 GHz tone</li><li>measure amplitude at +/- 10 kHz, +/- 100 kHz, +/- 1 MHz, ...</li><li>generate a 2 GHz tone</li><li>measure amplitude at +/- 10 kHz, +/- 100 kHz, +/- 1 MHz, ...</li><li>want to overlay those offset amplitude curves</li></ul><div>You could just plot vs absolute frequency to see one, but to overlay you need to center around a tone, and it just makes sense to show 'offset from tone' as the x axis. However, those steps imply a log scale.</div></div><div><br /></div><div>Below is a working example of exactly this situation in plotly.js. I've included the ideal here with both positive and negative on a log scale, and the normal linear plot so that the difference in parsing it quickly is obvious:</div><div><br /></div><p class="codepen" data-default-tab="js,result" data-height="500" data-pen-title="symlog approximation" data-slug-hash="KKWamLW" data-theme-id="light" data-user="rhamner" style="align-items: center; border: 2px solid; box-sizing: border-box; display: flex; height: 500px; justify-content: center; margin: 1em 0px; padding: 1em;"> <span>See the Pen <a href="https://codepen.io/rhamner/pen/KKWamLW"> symlog approximation</a> by Robert Hamner (<a href="https://codepen.io/rhamner">@rhamner</a>) on <a href="https://codepen.io">CodePen</a>.</span></p><script async="" src="https://cpwebassets.codepen.io/assets/embed/ei.js"></script><div><br /></div><div><br /></div><div>The basic algorithm is pretty simple:</div><div><ul style="text-align: left;"><li>Determine the max and min values and the value closest to zero; largest of max and abs(min) is upper bound...value closest to zero is lower<br /><br /></li><li>Split all traces into positive and negative (x values here since I just did this for x in the demo)<br /><br /></li><li>Create two x-axes: one for positive and one for negative</li><ul><li>give both the same bounds</li><li>reverse the negative x-axis</li><li>assign ticks with positive values but negative labels to the negative x-axis</li><li>put a small buffer between them to represent that zero is undefined<br /><br /></li></ul><li>Plot positive traces vs positive x-axis and negative traces vs negative x-axis, but make the negative x values positive<br /></li></ul><div>In that demo above you can just step through the javascript code and it should all be pretty clear.</div></div><div><br /></div><div>If you want a slight variant of this that matches <a href="https://matplotlib.org/stable/gallery/scales/symlog_demo.html">'symlog' in matplotlib</a>, just add a third, linear axis to connect these two instead of leaving a gap. I personally prefer the gap for this situation.</div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-51085009306756570162021-05-02T21:45:00.008-07:002021-05-02T22:26:45.703-07:00Simple way to see code coverage in pythonSometimes you want to quickly see unit test coverage of your code. <a href="https://coverage.readthedocs.io/en/coverage-5.5/">Coverage.py</a> makes that really simple.<div class="separator" style="clear: both; text-align: center;"><a href="https://lh3.googleusercontent.com/-0BnmIqZgxr8/YI9-RMJp-rI/AAAAAAAAH7c/fXQHYip3X9Y9zKlDzrpK8lJYVBLbqyw1gCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="264" data-original-width="599" height="auto" src="https://lh3.googleusercontent.com/-0BnmIqZgxr8/YI9-RMJp-rI/AAAAAAAAH7c/fXQHYip3X9Y9zKlDzrpK8lJYVBLbqyw1gCLcBGAsYHQ/image.png" width="0%" /></a></div><a name='more'></a>First, what do I mean by test coverage? Below is an example for a really simple usage:<div><br /></div><div><div class="separator" style="clear: both; text-align: center;"><div class="separator" style="clear: both; text-align: center;"><a href="https://lh3.googleusercontent.com/-0BnmIqZgxr8/YI9-RMJp-rI/AAAAAAAAH7c/fXQHYip3X9Y9zKlDzrpK8lJYVBLbqyw1gCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="264" data-original-width="599" height="auto" src="https://lh3.googleusercontent.com/-0BnmIqZgxr8/YI9-RMJp-rI/AAAAAAAAH7c/fXQHYip3X9Y9zKlDzrpK8lJYVBLbqyw1gCLcBGAsYHQ/image.png" /></a></div><br /><br /></div><div>That tells me how much of the code is executed when I executed my tests (start with test_* here). For that example, I just have two files with two methods in each. The files are identical and look like this:</div><div><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://lh3.googleusercontent.com/-MwqLmZPrhXY/YI993MTVABI/AAAAAAAAH7U/LVljKk55-zYJ6eclTe-kz69vC-egLQrUgCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="368" data-original-width="354" height="240" src="https://lh3.googleusercontent.com/-MwqLmZPrhXY/YI993MTVABI/AAAAAAAAH7U/LVljKk55-zYJ6eclTe-kz69vC-egLQrUgCLcBGAsYHQ/image.png" width="231" /></a></div><br /></div><div><br /></div><div>I have one file with unit tests. I'm using <a href="https://docs.pytest.org/en/6.2.x/">pytest</a> for this but coverage works with other test frameworks. Here is the unit test file:</div><div><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://lh3.googleusercontent.com/-yzV3V3iteRY/YI-AH0q4akI/AAAAAAAAH7s/aCAv0S0ExkYQcLUYcPHmqHbNpGPTc1ctQCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="256" data-original-width="536" height="153" src="https://lh3.googleusercontent.com/-yzV3V3iteRY/YI-AH0q4akI/AAAAAAAAH7s/aCAv0S0ExkYQcLUYcPHmqHbNpGPTc1ctQCLcBGAsYHQ/image.png" width="320" /></a></div><br /></div><div><br /></div><div>With a file this simple, we can see some obvious test gaps like:<br /><ul style="text-align: left;"><li>it doesn't test all functions in all files</li><li>it doesn't test all paths in the functions (e.g., no 0 test in file1)</li></ul><div>You can imagine how hard it is to see that for any realistic code though. This is where coverage checks can help. Clicking into the coverage for file2 here we get:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://lh3.googleusercontent.com/-ZWBL3S-6njo/YI9-XW1XHYI/AAAAAAAAH7g/NJAA3aOraUIotlFzynQdeXYxPyw3aCtLgCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="404" data-original-width="505" height="240" src="https://lh3.googleusercontent.com/-ZWBL3S-6njo/YI9-XW1XHYI/AAAAAAAAH7g/NJAA3aOraUIotlFzynQdeXYxPyw3aCtLgCLcBGAsYHQ/image.png" width="300" /></a></div></div><div><br /></div><div>Really simple to see that we missed the non-zero case in is_zero and that we didn't call the is_zero_wrapper function at all in any of the tested paths.</div><div><br /></div><div>It is important to note that 100% coverage doesn't mean your code is perfectly tested and that less than 100% coverage doesn't mean your codebase is garbage. This is just one of many useful metrics for gauging test coverage and testing gaps.</div><div><br /></div><div>To set this up and run it:<br /><ol style="text-align: left;"><li>install coverage (e.g., 'pip install coverage')</li><li>install pytest (e.g., 'pip install pytest')</li><li>run 'coverage run -m pytest'</li><li>run 'coverage html'</li><li>open the index.html file in the htmlcov folder that it created</li></ol><div>That index.html file is what I have in the screenshot at the start.</div></div><div><br /></div><div>If you want to use my exact code to test this, it's available <a href="https://github.com/rhamner/coverage_example">here</a>.</div><div><br /></div></div></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-14912543741358950362021-04-06T21:11:00.009-07:002021-04-06T22:51:23.444-07:00Thinking in terms of probabilitiesWe suck at probability. A common trap we fall into is failing to realize this and thinking in terms of absolutes.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-G03fGZ7v-c0/YG0wLxaqBGI/AAAAAAAAH24/dhlu6HZI7moplJnQxrdX-JePpu1Z7hUyQCLcBGAsYHQ/s1058/wealth%2Bquintiles.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="449" data-original-width="1058" src="https://1.bp.blogspot.com/-G03fGZ7v-c0/YG0wLxaqBGI/AAAAAAAAH24/dhlu6HZI7moplJnQxrdX-JePpu1Z7hUyQCLcBGAsYHQ/s1600/wealth%2Bquintiles.png" width="0%" /></a></div><a name='more'></a>What's an example of this? One I've encountered many times is something like 'if you work hard you can stop being poor' or 'everyone decides their own wealth'. Is this true? This is what I mean...there is no absolute yes or no answer. Consider the following plot (<a href="https://www.stlouisfed.org/publications/regional-economist/july-2016/which-persists-more-from-generation-to-generation-income-or-wealth" target="_blank">source data</a>):<div><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-G03fGZ7v-c0/YG0wLxaqBGI/AAAAAAAAH24/dhlu6HZI7moplJnQxrdX-JePpu1Z7hUyQCLcBGAsYHQ/s1058/wealth%2Bquintiles.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="449" data-original-width="1058" src="https://1.bp.blogspot.com/-G03fGZ7v-c0/YG0wLxaqBGI/AAAAAAAAH24/dhlu6HZI7moplJnQxrdX-JePpu1Z7hUyQCLcBGAsYHQ/s1600/wealth%2Bquintiles.png" width="95%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><br /></div></div><div>The following statements are all true:</div><div><ul style="text-align: left;"><li>some kids from the poorest households end up wealthy</li><li>some kids from the wealthiest households end up poor</li><li>most poor kids stay poor</li><li>most rich kids stay rich</li><li>parental wealth is a good predictor of your wealth as an adult</li></ul><div>Many people will see someone claim that last bullet and jump to 'what about this guy that grew up poor and made it?' The plot clearly shows that's possible and doesn't negate the last bullet. Thinking in terms of what's likely is a better model for this.</div></div><div><br /></div><div>Another common example is the classic 'it's cold today so global warming isn't real'. If you don't think of the distributions of temperatures, this is an easy fallacy to fall victim to. Here are two plots of temperature distributions for Denver, Colorado summer highs in 1900 and 2000 (respectively):<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-xwLFXXJ3ECw/YG0wPj8Ue8I/AAAAAAAAH28/Abiu1IvQZmgQD813a6ziBghSmPv6iSQVgCLcBGAsYHQ/s1072/1900%2Bdenver%2Bsummers.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="496" data-original-width="1072" src="https://1.bp.blogspot.com/-xwLFXXJ3ECw/YG0wPj8Ue8I/AAAAAAAAH28/Abiu1IvQZmgQD813a6ziBghSmPv6iSQVgCLcBGAsYHQ/s1600/1900%2Bdenver%2Bsummers.PNG" width="95%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-zhIHE602iZg/YG1IFFOujXI/AAAAAAAAH3M/gmEevi4eMFY8BiJYspVzPFTJHJz1mlVpwCLcBGAsYHQ/s1071/2000%2Bdenver%2Bsummers.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="498" data-original-width="1071" src="https://1.bp.blogspot.com/-zhIHE602iZg/YG1IFFOujXI/AAAAAAAAH3M/gmEevi4eMFY8BiJYspVzPFTJHJz1mlVpwCLcBGAsYHQ/s1600/2000%2Bdenver%2Bsummers.PNG" width="95%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><br /></div><br />There are clearly cold (for summer) days in both. There is also a clear shift towards higher temperatures in 2000 vs 1900. That 'the distribution has shifted towards higher temperatures' is the best mental model for global warming in my opinion. If you want to see more of these <a href="https://cityprojections.com/summerHighHistograms.html">I pulled them from this page.</a></div><div><br /></div><div>This could go on forever but I hope the general idea is clear. Many things are distribution-based and can be understood much more easily if thought of in terms of 'how does this distribution shift/compare?'</div><div><br /></div><div>If you're interested in a great book on this general topic, <a href="https://www.amazon.com/gp/product/0307275175/ref=as_li_qf_asin_il_tl?ie=UTF8&tag=rhamner-20&creative=9325&linkCode=as2&creativeASIN=0307275175&linkId=b33c0c74e4c510ecadee926cad4195b0" target="_blank">I liked 'The Drunkard's Walk'.</a></div><div><br /></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-57591309907030021152021-02-07T22:11:00.002-08:002021-02-07T22:24:51.939-08:00If the square root of -1 is i, what is the cube root of -1?You probably learned at some point that the square root of -1 is i. What about the cubed root of it? There's the obvious answer of (-1)^3 = -1, but the answer isn't actually that simple.<div class="separator" style="clear: both; text-align: center;"><a href="https://lh3.googleusercontent.com/-tWhXh8XZhws/YCDP3jeyusI/AAAAAAAAHqQ/PUHtey2GXk4_osFLoZWsmJC1BuYHSzsYwCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="1688" data-original-width="1200" height="449" src="https://lh3.googleusercontent.com/-tWhXh8XZhws/YCDP3jeyusI/AAAAAAAAHqQ/PUHtey2GXk4_osFLoZWsmJC1BuYHSzsYwCLcBGAsYHQ/w320-h449/image.png" width="0%" /></div><a name='more'></a><script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script> <script async="" id="MathJax-script" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"><style>.MathJax { font-size: 1.3em !important; }</style> </script>To answer this, we'll need Euler's identity which is:<div style="font-size: 60px;"> \[e^{i\pi}=-1\]</div>Just take the cubed root of each side:<div style="font-size: 60px;"> \[e^{i\pi*\frac{1}{3}}=-1^{\frac{1}{3}}\]</div><div style="font-size: 60px;"> \[e^{\frac{i\pi}{3}}=-1^{\frac{1}{3}}\]</div>Now we just need the following definition:<div style="font-size: 60px;"> \[e^{ix}=cos(x)+i*sin(x)\]</div>Plugging in our value:<div style="font-size: 60px;"> \[e^{\frac{i\pi}{3}}=-1^{\frac{1}{3}}\]</div><div style="font-size: 60px;"> \[cos(\frac{\pi}{3}) + i*sin(\frac{\pi}{3})=-1^{\frac{1}{3}}\]</div><div style="font-size: 60px;"> \[\frac{1}{2} + i*\frac{\sqrt{3}}{2}=-1^{\frac{1}{3}}\]</div>And that's it...there's another cube root of -1.<div><br />What does that actually mean? Consider this coordinate system: </div><div><br /></div></a><div><a href="https://lh3.googleusercontent.com/-tWhXh8XZhws/YCDP3jeyusI/AAAAAAAAHqQ/PUHtey2GXk4_osFLoZWsmJC1BuYHSzsYwCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"></a><div class="separator" style="clear: both; text-align: center;"><a href="https://lh3.googleusercontent.com/-tWhXh8XZhws/YCDP3jeyusI/AAAAAAAAHqQ/PUHtey2GXk4_osFLoZWsmJC1BuYHSzsYwCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"></a><a href="https://lh3.googleusercontent.com/-tWhXh8XZhws/YCDP3jeyusI/AAAAAAAAHqQ/PUHtey2GXk4_osFLoZWsmJC1BuYHSzsYwCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="1688" data-original-width="1200" height="449" src="https://lh3.googleusercontent.com/-tWhXh8XZhws/YCDP3jeyusI/AAAAAAAAHqQ/PUHtey2GXk4_osFLoZWsmJC1BuYHSzsYwCLcBGAsYHQ/w320-h449/image.png" width="320" /></a></div><br /><br /></div><div>With real numbers on the horizontal axis and imaginary numbers on the vertical axis, you can draw complex numbers as vectors. This has a cool property. We got pi/3 radians as our angle there. That's equal to 60 degrees, or one-sixth of a full rotation. Looking at that coordinate system, if r = 1:<br /><br /><ul style="text-align: left;"><li>0 degrees = 1</li><li>90 degrees = i</li><li>180 degrees = -1</li><li>270 degrees = -i</li><li>360 degrees = 1</li><li>450 degrees = i</li><li>...</li></ul><div>It rotates around. Since an angle of pi/3 represents 60 degrees, cubing the value with r = 1 and angle = 60 degrees gives you the same thing as r = 1 and angle = 180 degrees, which is -1.</div></div><div><br /></div><div>Thinking through it a bit more, that's not unique. What if we used 300 degrees instead? Rotating by 300 degrees 3 times gives you 900 degrees which is just 2 revolutions + 180 degrees. Will that give you -1 also?</div><div><br /></div><div>60 degree answer cubed:<div style="font-size: 60px;"> \[(\frac{1}{2} + i*\frac{\sqrt{3}}{2})*(\frac{1}{2} + i*\frac{\sqrt{3}}{2})*(\frac{1}{2} + i*\frac{\sqrt{3}}{2})\]</div><div style="font-size: 60px;"> \[(\frac{1}{4} + i*\frac{\sqrt{3}}{2} - \frac{3}{4})*(\frac{1}{2} + i*\frac{\sqrt{3}}{2})\]</div><div style="font-size: 60px;"> \[\frac{1}{8} + i*\frac{\sqrt{3}}{8} + i*\frac{\sqrt{3}}{4} - \frac{3}{4} - \frac{3}{8} - i*\frac{3*\sqrt{3}}{8}\]</div>That adds up to -1 which is what we wanted.<br /><br />300 degree answer cubed:<div style="font-size: 60px;"> \[(\frac{1}{2} - i*\frac{\sqrt{3}}{2})*(\frac{1}{2} - i*\frac{\sqrt{3}}{2})*(\frac{1}{2} - i*\frac{\sqrt{3}}{2})\]</div><div style="font-size: 60px;"> \[(\frac{1}{4} - i*\frac{\sqrt{3}}{2} + \frac{3}{4})*(\frac{1}{2} + i*\frac{\sqrt{3}}{2})\]</div><div style="font-size: 60px;"> \[\frac{1}{8} - i*\frac{\sqrt{3}}{8} - i*\frac{\sqrt{3}}{4} + \frac{3}{4} + \frac{3}{8} + i*\frac{3*\sqrt{3}}{8}\]</div>That also adds up to -1 which is what we wanted. Finally, we have the -1^3 = -1 answer which is just the 180 degree one.<br /><br />Thus, we found three cubed roots of -1: 0.5 + 0.866i, 0.5 - 0.866i, and -1.<br><br>For the one we all learned...'square root of -1 is i'...is that really the only answer? Doing a similar exercise, you want to end up at m*360 + 180 degrees after n rotations where n is the root and m is an integer. Here, n = 2. That means 2*rotation = m*360 + 180, or rotation = 180*m + 90. Start with m = 0. rotation = 90 which means i is an answer which we know. Try m = 1. rotation = 270 which means -i is answer. Trying that out...-i * -i = i^2 = -1. That works. Try m = 2. rotation = 450 which is just 90 + 1 full cycle, so we're repeating now. i and -i are our square roots of -1.</div><div><br /></div><div><br /></div><div><br /></div><div><br /></div></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-79837416741239166202021-01-24T22:49:00.009-08:002021-01-25T15:48:05.898-08:00Regression Toward the Mean in the NFLI wanted to run some quick tests to see if <a href="https://en.wikipedia.org/wiki/Regression_toward_the_mean#:~:text=In%20statistics%2C%20regression%20toward%20the,or%20average%20on%20further%20measurements.&text=The%20answer%20was%20not%20%27on%20average%20directly%20above%27.">regression toward the mean</a> shows up clearly in NFL data.<a href="https://1.bp.blogspot.com/-QFQZegAeRdE/YA5n0yv7YAI/AAAAAAAAHnA/Fi2nHEFMGbMVHS6pVGppPb2hdL2xaE5mACLcBGAsYHQ/s1296/passing%2Byards%2Bscatter.png" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" data-original-height="632" data-original-width="1296" src="https://1.bp.blogspot.com/-QFQZegAeRdE/YA5n0yv7YAI/AAAAAAAAHnA/Fi2nHEFMGbMVHS6pVGppPb2hdL2xaE5mACLcBGAsYHQ/s1600/passing%2Byards%2Bscatter.png" width="0%" /></a><a name='more'></a><h4 style="text-align: left;"><br /></h4><h4 style="text-align: left;">Background</h4><div>In case you aren't familiar, 'regression toward the mean' roughly means that if a random variable is an outlier, a future instance is likely to be closer to the mean. For a really simple model to make this easy to understand for something like NFL player performance, imagine that each player's performance is X% skill and Y% luck. If X is 100 and Y is 0, then previous years will nearly perfectly predict future years. If Y is 100 and X is 0, then there will be no relationship between performance from one year to the next. If X and Y are both between 0 and 100, there will be some relationship between performance from year to year but it won't be perfect. <div><br /></div><div>There are two easy ways for me to look at this phenomenon:<br /><ol style="text-align: left;"><li>plot one year's performance against the previous year's along with a line with a slope of 1 (X = 100%) and a best-fitting line<br /><br /></li><li>bin the data by previous year's performance and look at how each bin shifted in the next year</li></ol><div>What might we see? There are many possibilities, but here are a few examples:</div><div><ul style="text-align: left;"><li>"Players that performed well perform even better the next season": plot 1 will show a slope greater than 1 and plot 2 will show the bottom bin doing worse and the top bin doing better<br /><br /></li><li>"Performance is driven by skill so it's the same year-to-year": plot 1 will show a slope of 1 and plot 2 will show all bins at roughly zero<br /><br /></li><li>"Performance is a mix of skill and luck so top performers will move back towards average and poor performers will move up towards average (<b>this is the regression toward the mean case</b>)": plot 1 will show a slope between 0 and 1, and plot 2 will show the bottom bin doing better and the top bin doing worse<br /><br /></li><li>"It's all random/luck": plot 1 will show a slope of ~0 and plot 2 will show all bins at roughly 0<br /><br /></li><li>"Poor performers overcompensate and end up better than average next season": plot 1 will show a slope less than 1 and plot 2 will show the bottom bin doing better and the top bin doing worse</li></ul></div><div><div>To test it out I ran with 5 different stats using data from all starters from 2000-2020. For example, for a 2010-2011 compare, year 1 is 2010 and year 2 is 2011. You would expect the best performers in 2010 to do a bit worse in 2011, and the worst in 2010 to do a bit better in 2011. In the bar plots, the 'bottom third' means the 33% of players that were worst in season 1 from the plot above.</div></div></div><div><br /></div><h4 style="text-align: left;">Results</h4><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-QFQZegAeRdE/YA5n0yv7YAI/AAAAAAAAHnA/Fi2nHEFMGbMVHS6pVGppPb2hdL2xaE5mACLcBGAsYHQ/s1296/passing%2Byards%2Bscatter.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="632" data-original-width="1296" src="https://1.bp.blogspot.com/-QFQZegAeRdE/YA5n0yv7YAI/AAAAAAAAHnA/Fi2nHEFMGbMVHS6pVGppPb2hdL2xaE5mACLcBGAsYHQ/s1600/passing%2Byards%2Bscatter.png" width="100%" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-RQVgzFIad6M/YA5n0ix8dUI/AAAAAAAAHm8/YMQhfYd97Aon_HTnd6_9nOEbXOIU1j6FACLcBGAsYHQ/s966/passing%2Byards%2Bbar.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="599" data-original-width="966" src="https://1.bp.blogspot.com/-RQVgzFIad6M/YA5n0ix8dUI/AAAAAAAAHm8/YMQhfYd97Aon_HTnd6_9nOEbXOIU1j6FACLcBGAsYHQ/s1600/passing%2Byards%2Bbar.png" width="75%" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-TOa0h2kC5ew/YA5n0KTPbHI/AAAAAAAAHmw/cpSWeswqN4Evk_9iZ9nVyTvSdhONjSLKgCLcBGAsYHQ/s1298/interceptions%2Bscatter.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="632" data-original-width="1298" src="https://1.bp.blogspot.com/-TOa0h2kC5ew/YA5n0KTPbHI/AAAAAAAAHmw/cpSWeswqN4Evk_9iZ9nVyTvSdhONjSLKgCLcBGAsYHQ/s1600/interceptions%2Bscatter.png" width="100%" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-HzEaYLn2Vrc/YA5n0HAYRkI/AAAAAAAAHms/5Dezm_RoAAMBMeG8ze0s0iQoxcPVK8Z3wCLcBGAsYHQ/s964/interceptions%2Bbar.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="599" data-original-width="964" src="https://1.bp.blogspot.com/-HzEaYLn2Vrc/YA5n0HAYRkI/AAAAAAAAHms/5Dezm_RoAAMBMeG8ze0s0iQoxcPVK8Z3wCLcBGAsYHQ/s1600/interceptions%2Bbar.png" width="75%" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-x3lwJWJJEi4/YA5n0pigllI/AAAAAAAAHm0/KqmfMB9PRBwVvykBO0Ol-Ub6buqv0a4UgCLcBGAsYHQ/s1289/passing%2Btds%2Bscatter.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="632" data-original-width="1289" src="https://1.bp.blogspot.com/-x3lwJWJJEi4/YA5n0pigllI/AAAAAAAAHm0/KqmfMB9PRBwVvykBO0Ol-Ub6buqv0a4UgCLcBGAsYHQ/s1600/passing%2Btds%2Bscatter.png" width="100%" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-kWXIDfrzYFE/YA5n0OHxEiI/AAAAAAAAHm4/6LfT8cBXAWUZyDFgEyTLGKZOVDByGrS4wCLcBGAsYHQ/s957/passing%2Btds%2Bbar.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="599" data-original-width="957" src="https://1.bp.blogspot.com/-kWXIDfrzYFE/YA5n0OHxEiI/AAAAAAAAHm4/6LfT8cBXAWUZyDFgEyTLGKZOVDByGrS4wCLcBGAsYHQ/s1600/passing%2Btds%2Bbar.png" width="75%" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-8UE3rj7dOuw/YA5n04KRWUI/AAAAAAAAHnI/R3aHBXjn_HE_DsuHpn9DBL1nADkUe1fAQCLcBGAsYHQ/s1298/rushing%2Btds%2Bscatter.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="632" data-original-width="1298" src="https://1.bp.blogspot.com/-8UE3rj7dOuw/YA5n04KRWUI/AAAAAAAAHnI/R3aHBXjn_HE_DsuHpn9DBL1nADkUe1fAQCLcBGAsYHQ/s1600/rushing%2Btds%2Bscatter.png" width="100%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Y4OCkEsxh8k/YA5n04fwDAI/AAAAAAAAHnE/LnB3xk4WUC04XVslvAuOeXEACPygajVbwCLcBGAsYHQ/s1019/rushing%2Btds%2Bbar.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="599" data-original-width="1019" src="https://1.bp.blogspot.com/-Y4OCkEsxh8k/YA5n04fwDAI/AAAAAAAAHnE/LnB3xk4WUC04XVslvAuOeXEACPygajVbwCLcBGAsYHQ/s1600/rushing%2Btds%2Bbar.png" width="75%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-X2nbFmPJdQ4/YA5n00yoAPI/AAAAAAAAHnQ/IBW5Ly3Nl-sPbiKOuIKbVgyISLZEyuZAQCLcBGAsYHQ/s1293/rushing%2Byards%2Bscatter.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="632" data-original-width="1293" src="https://1.bp.blogspot.com/-X2nbFmPJdQ4/YA5n00yoAPI/AAAAAAAAHnQ/IBW5Ly3Nl-sPbiKOuIKbVgyISLZEyuZAQCLcBGAsYHQ/s1600/rushing%2Byards%2Bscatter.png" width="100%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-BIInLBSZy2o/YA5n09Nkj8I/AAAAAAAAHnM/3mCuF1M-mXALmit_AlgiaLJ9xLvq3rviQCLcBGAsYHQ/s965/rushing%2Byards%2Bbar.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="599" data-original-width="965" src="https://1.bp.blogspot.com/-BIInLBSZy2o/YA5n09Nkj8I/AAAAAAAAHnM/3mCuF1M-mXALmit_AlgiaLJ9xLvq3rviQCLcBGAsYHQ/s1600/rushing%2Byards%2Bbar.png" width="75%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><br /></div><div>and the data show regression toward the mean. Every stat I've tried (with a luck component obviously) followed the pattern above.</div><div><br /></div><div><br /></div></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com4tag:blogger.com,1999:blog-1532419805701836386.post-35851241756957448582021-01-15T22:00:00.003-08:002021-01-15T22:00:17.004-08:00Fourier Series AnimationsIt always seemed magical to me that you can get a square wave from adding together sine waves, so I threw together some animations of Fourier series.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-vg3ZxH0-gw8/YAKAT-apG3I/AAAAAAAAHg0/1wofnIL257Y7HXxihcvjpCsUfIK_U-_MgCLcBGAsYHQ/s1329/fourier_square.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="542" data-original-width="1329" src="https://1.bp.blogspot.com/-vg3ZxH0-gw8/YAKAT-apG3I/AAAAAAAAHg0/1wofnIL257Y7HXxihcvjpCsUfIK_U-_MgCLcBGAsYHQ/s1600/fourier_square.gif" width="0%" /><a name='more'></a><h4 style="text-align: left;">Square Wave</h4><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-vg3ZxH0-gw8/YAKAT-apG3I/AAAAAAAAHg0/1wofnIL257Y7HXxihcvjpCsUfIK_U-_MgCLcBGAsYHQ/s1329/fourier_square.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="542" data-original-width="1329" src="https://1.bp.blogspot.com/-vg3ZxH0-gw8/YAKAT-apG3I/AAAAAAAAHg0/1wofnIL257Y7HXxihcvjpCsUfIK_U-_MgCLcBGAsYHQ/s1600/fourier_square.gif" width="100%" /></a></div><br /><div><br /></div><h4 style="text-align: left;">Pulse</h4><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-vpJy7aKFa54/YAKAXowtT7I/AAAAAAAAHg4/FR6azL15p50R9V0q7YvUj_PL04oBZsVNgCLcBGAsYHQ/s1329/fourier_pulse.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="542" data-original-width="1329" src="https://1.bp.blogspot.com/-vpJy7aKFa54/YAKAXowtT7I/AAAAAAAAHg4/FR6azL15p50R9V0q7YvUj_PL04oBZsVNgCLcBGAsYHQ/s1600/fourier_pulse.gif" width="100%" /></a></div><br /><div><br /></div><h4 style="text-align: left;">Parabola</h4><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-RRBsnZi7UAE/YAKAankvQkI/AAAAAAAAHg8/zI6IjeXc_owngmcFi6MuzY9tTNSCbqnCQCLcBGAsYHQ/s1329/fourier_parabolas.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="542" data-original-width="1329" src="https://1.bp.blogspot.com/-RRBsnZi7UAE/YAKAankvQkI/AAAAAAAAHg8/zI6IjeXc_owngmcFi6MuzY9tTNSCbqnCQCLcBGAsYHQ/s1600/fourier_parabolas.gif" width="100%" /></a></div><br /><div><br /></div><h4 style="text-align: left;">Pulse Variation</h4><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-TCn8OV2DgHE/YAKAfHfSf-I/AAAAAAAAHhA/9OO3-10UFUcK4rhwlk0NvMu031vKFaxRgCLcBGAsYHQ/s1329/fourier_sinpulse.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="542" data-original-width="1329" src="https://1.bp.blogspot.com/-TCn8OV2DgHE/YAKAfHfSf-I/AAAAAAAAHhA/9OO3-10UFUcK4rhwlk0NvMu031vKFaxRgCLcBGAsYHQ/s1600/fourier_sinpulse.gif" width="100%" /></a></div><br /><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-27595343247794166902021-01-02T22:12:00.003-08:002021-01-02T22:13:31.752-08:00How Should You Bet on a Biased Coin Toss?If you know a coin is biased to come up heads 75% of the time, what betting strategy should you use to bet on the outcome of a flip?<img src="https://1.bp.blogspot.com/-QXUfd2an15E/XOddy0mjsQI/AAAAAAAAFAk/GYcuc-m0xDUVMsaBpf4rmi3BufTWPZkpwCLcBGAs/s1600/United_States_Quarter.jpg" width=0%"><a name='more'></a></img><p>What might seem intuitive is to have a mixed strategy of 75% heads and 25% tails. Maybe something like 'flip an unbiased coin twice and bet tails if you get tails twice and heads for any other outcome'. What result will that give you?</p><p>There are four possibilities here:</p><p></p><ol style="text-align: left;"><li>Coin lands on heads and you bet heads (75%*75% = 56.25% of the time)</li><li>Coin lands on heads and you bet tails (75%*25% = 18.75% of the time)</li><li>Coin lands on tails and you bet heads (25%*75% = 18.75% of the time)</li><li>Coin lands on tails and you bet tails (25%*25% = 6.25% of the time)</li></ol><div>1 and 4 are winning situations, so you'll win 62.5% of the time this way (just sum the 1 and 4 win rates).</div><div><br /></div><div>You might immediately notice that 62.5% is less than 75%. What if you just always bet heads? Filling out the same list as above:</div><div></div><div><ol><li>Coin lands on heads and you bet heads (75%*100% = 75% of the time)</li><li>Coin lands on heads and you bet tails (75%*0% = 0% of the time)</li><li>Coin lands on tails and you bet heads (25%*100% = 25% of the time)</li><li>Coin lands on tails and you bet tails (25%*0% = 0% of the time)</li></ol><div>1 and 4 are winning situations, so you'll win 75% of the time this way. In this situation, the general win rate is:<br /><br />(coin bias*head bet percentage) + [(1 - coin bias)*(1 - head bet percentage)] = win rate</div></div><div><br /></div><div>We want to maximize this. Using b for 'coin bias' and h for 'heat bet percentage':</div><div><br /></div><div>b*h + (1 - b)*(1 - h) = win rate</div><div><br /></div><div>b*h + 1 - h - b + b*h = win rate</div><div><br /></div><div>2*b*h + 1 - h - b = win rate</div><div><br /></div><div>h*(2*b - 1) + 1 - b = win rate</div><div><br /></div><div>At this point, we have an equation for a line. Win rate vs h is a line with a slope of 2*b - 1 and an x-intercept of 1 - b. Anytime 2*b - 1 is positive, this line will go up and to the right so h = 1 is the best bet (heads 100% of the time). Anytime 2*b - 1 is negative, h = 0 is the best bet (tails 100% of the time). 2*b - 1 is positive whenever b is greater than 0.5. Thus, the optimal strategy here is bet in the direction of the bias 100% of the time when you have a known, biased coin.</div><div><br /></div><div><br /></div><p></p>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com8tag:blogger.com,1999:blog-1532419805701836386.post-55627131940206901142020-12-27T22:02:00.004-08:002020-12-27T22:04:38.093-08:00Making a CSS Flashlight Effect Using Conic-gradientsThis is just a quick tutorial of conic-gradients showing a flashlight effect with very little code.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-S8K75bL0ewc/X-l0zcdwY-I/AAAAAAAAHac/VHQ1ugh0T-IRHSjTrDNPLRThoh0MC-lpACLcBGAsYHQ/s1867/Capture.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="654" data-original-width="1867" src="https://1.bp.blogspot.com/-S8K75bL0ewc/X-l0zcdwY-I/AAAAAAAAHac/VHQ1ugh0T-IRHSjTrDNPLRThoh0MC-lpACLcBGAsYHQ/s1600/Capture.PNG" width="0%" /></a></div><a name='more'></a><p>The basic idea here is to use a <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/conic-gradient()" target="_blank">conic-gradient</a> and do the following:<br /></p><ul style="text-align: left;"><li>set it to be the flashlight color and fairly transparent for the bright area (yellow from -25 to 25 degrees in the example here)</li><li>set it to be dark and fairly opaque for the dark area (black with 95% opacity from 25 to 335 degrees in the example here)</li><li>make the flashlight layer(s) fixed position and sit on top of the page</li><li>to keep it from starting as a point, offset it (vertical location of 110% in the example here puts it 10% below the bottom of the page)</li></ul><div>And that's it...it's actually really simple. Here is a working example on top of a dummy html page:</div><div><br /></div><iframe allowfullscreen="true" allowtransparency="true" frameborder="no" height="600" loading="lazy" scrolling="no" src="https://codepen.io/rhamner/embed/VwKzNLd?height=602&theme-id=light&default-tab=css,result" style="width: 100%;" title="Flashlight"> See the Pen <a href='https://codepen.io/rhamner/pen/VwKzNLd'>Flashlight</a> by Robert Hamner (<a href='https://codepen.io/rhamner'>@rhamner</a>) on <a href='https://codepen.io'>CodePen</a>. </iframe><div><br /></div><div><br /></div><div>It's really clean and requires no javascript. It's probably possible to make it cleaner. An obvious question you might have is 'can I make a flashlight that moves with the mouse?' and the answer is sure...simply set the gradient position to the cursor location (this requires javascript but is simple):</div><div><br /></div><iframe allowfullscreen="true" allowtransparency="true" frameborder="no" height="600" loading="lazy" scrolling="no" src="https://codepen.io/rhamner/embed/dypJGZZ?height=476&theme-id=light&default-tab=css,result" style="width: 100%;" title="Flashlight mouse"> See the Pen <a href='https://codepen.io/rhamner/pen/dypJGZZ'>Flashlight mouse</a> by Robert Hamner (<a href='https://codepen.io/rhamner'>@rhamner</a>) on <a href='https://codepen.io'>CodePen</a>. </iframe><div><br /></div><div>All that took was adding a listener to the page for mouse or touch movements, updating --X and --Y variables on those events, and setting the conic-gradient position to be var(--X) var(--Y). Simple and looks pretty cool.</div><div><br /></div><div><br /></div><p></p>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-60114563780967415932020-12-11T22:03:00.011-08:002021-10-03T23:24:31.961-07:00What Are the Most Impressive NFL Combine Performances Ever?If you combine the major tests and adjust for weight and height, which NFL player had the most impressive combine performance?<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-qOWLlKhwTP4/X9Rc1t1e7BI/AAAAAAAAHYc/FTVOcMZqvGMiKsAKwxybmCaTs-m27e5NQCLcBGAsYHQ/s472/40%2Btimes.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="472" data-original-width="403" height="auto" src="https://1.bp.blogspot.com/-qOWLlKhwTP4/X9Rc1t1e7BI/AAAAAAAAHYc/FTVOcMZqvGMiKsAKwxybmCaTs-m27e5NQCLcBGAsYHQ/s1600/40%2Btimes.PNG" width="0%" /></a></div><a name='more'></a><h4 style="text-align: left;">Data</h4>Unfortunately, the modern combine hasn't existed for that long, so I only have data back to the year <a href="https://www.pro-football-reference.com/draft/2000-combine.htm" target="_blank">2000</a>. Still, that gives us a good-sized data set (~5000 players with at least some data).<p>To try to find 'best performance ever', I wanted to do two things:<br /></p><ol style="text-align: left;"><li>Adjust for weight and height...a 200 lb guy running a 4.5 forty is way less impressive than a 250 lb guy doing it.</li><li>Try to combine all metrics...a 200 lb guy running a 4.5 forty and getting 4 bench reps is way less impressive than a 200 lb running a 4.5 forty and getting 24 bench reps.</li></ol><div>The metrics that seem to be available for most people are:<br /><ul style="text-align: left;"><li>40 yard dash time</li><li>bench press reps (number of times they bench press 225 lbs)</li><li>broad jump</li><li>vertical jump</li></ul><div>So I used those.</div></div><div><br /></div><h4 style="text-align: left;">Calculation</h4><div>To calculate this, I used a three step process:</div><div><ol style="text-align: left;"><li>Perform linear regression for each metric using weight and height as inputs ('metric = C1*weight + C2*height + C3').</li><li>Divide actual value by value predicted from the regression for each metric to get a score. E.g., if a player ran a 4.5 40 and the model predicted a 4.7 one for his weight and height, he'd get 4.5/4.7, or 0.957 for that metric.</li><li>Calculate an overall score that's a weighted rss of the individual scores. The weights are 1, 1/5, 1/2, 1/2 for the four metrics in that order.</li></ol></div><div>It doesn't affect the calculation much, but throughout, I use weight as an input for everything but bench reps, and weight^2/3 as an input for bench reps.</div><div><br /></div><h4 style="text-align: left;">Results</h4><div>Using the calculation described above, these are the greatest combine performances (actual value to the left and predicted value in parentheses to the right):</div><div><br /></div><div><table style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr style="background-color: #e6a117;"><th>player</th><th>40 time (s)</th><th>bench (reps)</th><th>broad (inches)</th><th>vertical (inches)</th></tr><tr><td>Vernon Davis</td><td>4.38 (4.82)</td><td>33 (21)</td><td>128 (114)</td><td>42 (33)</td></tr><tr><td>Terna Nande</td><td>4.51 (4.70)</td><td>41 (20)</td><td>124 (115)</td><td>39 (34)</td></tr><tr><td>Vic Beasley</td><td>4.53 (4.78)</td><td>35 (20)</td><td>130 (115)</td><td>41 (33)</td></tr><tr><td>Mario Williams</td><td>4.70 (5.06)</td><td>35 (23)</td><td>120 (109)</td><td>40 (30)</td></tr><tr><td>Cornelius Washington</td><td>4.55 (4.89)</td><td>36 (22)</td><td>128 (112)</td><td>39 (32)</td></tr><tr><td>Myles Garrett</td><td>4.64 (4.93)</td><td>33 (23)</td><td>128 (111)</td><td>41 (31)</td></tr><tr><td>Nick Perry</td><td>4.55 (4.92)</td><td>35 (23)</td><td>124 (111)</td><td>38 (31)</td></tr><tr><td>Margus Hunt</td><td>4.62 (4.95)</td><td>38 (21)</td><td>121 (113)</td><td>34 (32)</td></tr><tr><td>D.K. Metcalf</td><td>4.33 (4.67)</td><td>27 (18)</td><td>134 (118)</td><td>40 (34)</td></tr><tr><td>Jerick McKinnon</td><td>4.41 (4.57)</td><td>32 (19)</td><td>132 (117)</td><td>40 (35)</td></tr><tr><td>Davis Tull</td><td>4.57 (4.78)</td><td>26 (21)</td><td>132 (114)</td><td>42 (33)</td></tr><tr><td>Jon Alston</td><td>4.50 (4.65)</td><td>30 (19)</td><td>132 (118)</td><td>40 (34)</td></tr><tr><td>Vernon Gholston</td><td>4.65 (4.89)</td><td>37 (23)</td><td>125 (112)</td><td>36 (32)</td></tr><tr><td>Sean Weatherspoon</td><td>4.62 (4.74)</td><td>34 (21)</td><td>123 (115)</td><td>40 (33)</td></tr><tr><td>Demario Davis</td><td>4.49 (4.71)</td><td>32 (19)</td><td>124 (116)</td><td>38 (34)</td></tr><tr><td>Scott Young</td><td>5.08 (5.15)</td><td>43 (27)</td><td>115 (104)</td><td>35 (29)</td></tr><tr><td>Michael Johnson</td><td>4.61 (4.89)</td><td>28 (20)</td><td>128 (114)</td><td>38 (32)</td></tr><tr><td>Alex Barnes</td><td>4.59 (4.66)</td><td>34 (20)</td><td>126 (117)</td><td>38 (34)</td></tr><tr><td>Benjamin Watson</td><td>4.50 (4.85)</td><td>34 (22)</td><td>123 (113)</td><td>36 (32)</td></tr><tr><td>Virgil Green</td><td>4.54 (4.79)</td><td>23 (20)</td><td>130 (115)</td><td>42 (33)</td></tr></tbody></table></div><div></div><div><br /></div><div>#1 there did not surprise me. Vernon Davis's 40 time is pretty well known as an insane combine performance.</div><div><br /></div><div>The first really odd one in that list is actually #2, Terna Nande. He had an extremely short NFL career with a single tackle in his entire career. However, at just 230 pounds he pulled off 41 reps on the bench, and all of his other performances were above average. No other non-lineman in history has gotten more than 40 reps. The rest of the top few had or are currently having pretty good NFL careers.</div><div><br /></div><div>Since the 40 time is the one that seems most discussed, here is the same analysis if you use only the 40 time to rank:<br /><br /></div><table style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr style="background-color: #e6a117;"><th>player</th><th>weight (lbs)</th><th>40 time (s)</th></tr><tr><td>Montez Sweat</td><td>260</td><td>4.41 (4.86)</td></tr><tr><td>Vernon Davis</td><td>254</td><td>4.38 (4.82)</td></tr><tr><td>Bryan Thomas</td><td>266</td><td>4.47 (4.89)</td></tr><tr><td>Dontari Poe</td><td>346</td><td>4.89 (5.35)</td></tr><tr><td>Dwight Freeney</td><td>266</td><td>4.48 (4.89)</td></tr><tr><td>Tank Johnson</td><td>304</td><td>4.69 (5.11)</td></tr><tr><td>Calvin Johnson</td><td>239</td><td>4.35 (4.74)</td></tr><tr><td>Dontay Moch</td><td>248</td><td>4.40 (4.79)</td></tr><tr><td>Matt Jones</td><td>242</td><td>4.37 (4.75)</td></tr><tr><td>Bruce Campbell</td><td>314</td><td>4.75 (5.16)</td></tr><tr><td>Taylor Mays</td><td>230</td><td>4.31 (4.69)</td></tr><tr><td>Terron Armstead</td><td>306</td><td>4.71 (5.12)</td></tr><tr><td>James Hanna</td><td>252</td><td>4.43 (4.81)</td></tr><tr><td>Martez Wilson</td><td>250</td><td>4.42 (4.80)</td></tr><tr><td>T.J. Duckett</td><td>254</td><td>4.45 (4.82)</td></tr><tr><td>Bruce Irvin</td><td>245</td><td>4.41 (4.77)</td></tr><tr><td>Rashan Gary</td><td>277</td><td>4.58 (4.95)</td></tr><tr><td>Connor Barwin</td><td>256</td><td>4.47 (4.83)</td></tr><tr><td>Nick Perry</td><td>271</td><td>4.55 (4.92)</td></tr><tr><td>Lane Johnson</td><td>303</td><td>4.72 (5.10)</td></tr></tbody></table><div><br /></div><div>It's interesting looking through both of these that the really legendary players aren't at the top. Many of them are good players, but Calvin Johnson and J.J Watt are the only ones near the top in either table that will definitely go down as all-time greats. Aaron Donald, Derrick Henry, etc. had above average combine performances but some that did clearly better went on to worse careers.</div><div><br /></div><div>I was curious about that and decided to go in the other direction. What great players had bad combine performances? To do that, I took all all-pro players and matched with names from the combine, and the worst were Max Unger, Tyrann Mathieu, and Tarik Cohen. All under-performed estimates in every metric here. The worst performance from an all-time great here was Adrian Peterson. He was roughly average, but I would have guessed his 40 time was way better (4.68 s).</div><div><br /></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-76553892138031591622020-12-04T22:13:00.004-08:002020-12-04T22:14:41.646-08:00Split Violin Plots in plotly.jsSplit violins are a cool way to compare distributions, and plotly makes them simple.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/--AM97WeNiLs/X8skxgbD7eI/AAAAAAAAHXQ/Sv06F1LgK-wogyDaUYZ9AmYaqRDvTMJQwCLcBGAsYHQ/s1459/split%2Bviolins.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="373" data-original-width="1459" src="https://1.bp.blogspot.com/--AM97WeNiLs/X8skxgbD7eI/AAAAAAAAHXQ/Sv06F1LgK-wogyDaUYZ9AmYaqRDvTMJQwCLcBGAsYHQ/s1600/split%2Bviolins.PNG" width="0%" /></a></div><a name='more'></a>There isn't much to explain here. I've embedded an example below showing how to use it. You just make a normal violin plot, but specify one trace as the negative side and another as the positive side, and plotly handles the rest.<br /><br /><iframe allowfullscreen="true" allowtransparency="true" frameborder="no" height="600px" loading="lazy" scrolling="no" src="https://codepen.io/rhamner/embed/qBaZdpe?height=265&theme-id=light&default-tab=js,result" style="width: 100%;" title="split violins in plotly js"> See the Pen <a href='https://codepen.io/rhamner/pen/qBaZdpe'>split violins in plotly js</a> by Robert Hamner (<a href='https://codepen.io/rhamner'>@rhamner</a>) on <a href='https://codepen.io'>CodePen</a>. </iframe><br /><br />If you've ever wanted to plot multiple distributions side-by-side, this is an easy option.<br />theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-20952557958848481262020-11-17T22:14:00.023-08:002020-11-18T21:13:43.569-08:00How Long Until My Investments Start Making Money?Say you invest some fixed amount of money every year. How long does it take for the investments to grow faster than the amount you're putting into them?<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/--W4XtFaZb2U/X7S_I6ToifI/AAAAAAAAHSY/R6kUx5TLIWg7bfyxd9hy_7tPBI_-7jYsACLcBGAsYHQ/s1356/plot.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="449" data-original-width="1356" height="auto" src="https://1.bp.blogspot.com/--W4XtFaZb2U/X7S_I6ToifI/AAAAAAAAHSY/R6kUx5TLIWg7bfyxd9hy_7tPBI_-7jYsACLcBGAsYHQ/s16000/plot.png" width="0%" /></a></div><br /><a name='more'></a><h4 style="text-align: left;">Basic math problem</h4> <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script> <script async="" id="MathJax-script" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"><style>.MathJax { font-size: 1.3em !important; }</style> </script><div>You invest $X per year in an account that yields R in gains. How long does it take for the gain in a year to be greater than $X?</div><div><br /></div><div>Another way of asking this is 'when does R times the future value of investing X each year exceed X?'</div><div><br /></div><div>The future value of a regular yearly investment where N = number of years, X = yearly investment, and R = growth rate of the investment is:</div><div><br /></div><div><br /></div><div style="font-size: 40px;">\[FV = \frac{X*((1+R)^N - 1)}{R}\]</div><div style="font-size: 28px;"><br /></div><div>What we're looking for is the number of years it takes for R times that to exceed X. That is, we want to solve:</div><div><br /></div><div style="font-size: 40px;">\[\frac{R*X*((1+R)^N - 1)}{R} > X\]</div><div><br /></div><div>Noticing that the R's cancel in numerator and denominator and dividing both sides by X you get:</div><div style="font-size: 28px;"><br /></div><div style="font-size: 40px;">\[((1+R)^N - 1) > 1\]</div><div style="font-size: 28px;"><br /></div><div>Adding 1 to both sides:</div><div style="font-size: 28px;"><br /></div><div style="font-size: 40px;">\[((1+R)^N) > 2\]</div><div style="font-size: 28px;"><br /></div><div>Simplifying:</div><div style="font-size: 28px;"><br /></div><div style="font-size: 40px;">\[log((1+R)^N) > log(2)\]</div><div style="font-size: 28px;"><br /></div><div style="font-size: 40px;">\[N*log((1+R)) > log(2)\]</div><div style="font-size: 28px;"><br /></div><div style="font-size: 40px;">\[N > \frac{log(2)}{log(1+R)}\]</div><div style="font-size: 28px;"><br /></div><div>This is kind of cool. You might not recognize it right away, but that says '<a href="https://www.somesolvedproblems.com/2016/04/why-does-dividing-72-by-your-interest.html" target="_blank">N is greater than the doubling time of the investment</a>'. It's really cool that it works out that way. Because of some pretty good approximations that work out, that means that the investment growth takes over the new money moved in after roughly '72 divided by annual interest rate' years.</div><div><br /></div><div>For a quick concrete example of what this means...say you invest $10,000 per year into an account yielding 6%. The time it takes for the 6% yield each year to exceed $10,000 is log(2)/log(1.06) which is ~12 years.</div><div style="font-size: 28px;"><br /></div><h4 style="text-align: left;">Simple plot</h4><div>Here's a simple interactive plot showing the breakdown between money invested and money from gains for an annual $10,000 investment using the interest rate that you enter below:</div><div><br /></div><div><br /></div>Interest rate (%) <input id="rate" onchange="updatePlot(this.value)" value="6" /><div id="plot" style="height: 400px; width: 100%;"></div> <script src="https://cdn.plot.ly/plotly-latest.min.js"></script><script>function updatePlot(rate) { rate = parseFloat(rate); let time = []; let input = []; let yield = []; let value = 0; for (let i = 0; i < 25; i++) { time.push(i); value = (value*(1 + (rate/100))) + 10000; input.push(i*10000); yield.push(value - i*10000); } Plotly.react('plot', [ { x: time, y: input, stackgroup: 'one', name: 'total invested' }, { x: time, y: yield, stackgroup: 'one', name: 'total gain' } ], { font: { size: 16 }, yaxis: { title: 'balance ($)' }, xaxis: { title: 'years' }, title: 'Comparing contributions from amount invested and investment yields' }, { responsive: true }); } updatePlot(6);</script>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-53313873600885337662020-11-01T22:30:00.012-08:002020-11-03T14:55:37.762-08:00How Do American Betting Odds Convert to Percent Chance?If you've looked at betting odds, you've probably seen something like +140 and -175. What % chance does that imply for each participant?<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-sBxt3nM6of8/X5-nSHueBKI/AAAAAAAAHP0/Eqzun0EJKYIzEbZn-0Y4JaG0zuDU2Ee8QCLcBGAsYHQ/s800/800px-Las_Vegas_sportsbook.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="533" data-original-width="800" src="https://1.bp.blogspot.com/-sBxt3nM6of8/X5-nSHueBKI/AAAAAAAAHP0/Eqzun0EJKYIzEbZn-0Y4JaG0zuDU2Ee8QCLcBGAsYHQ/s1600/800px-Las_Vegas_sportsbook.jpg" width="0" /></a></div><a name='more'></a><h4 style="text-align: left;">Definition</h4>First, what do those numbers mean? A -175 means 'you win $100 for each $175 that you bet' and a +140 means 'you win $140 for each $100 that you bet'.<div><br /><h4 style="text-align: left;">Example</h4>Now, consider a matchup that's 60% chance for A and 40% chance for B. What does that convert to?<p>Assuming no cost to bet, if there were 10 matches and you bet $100 on A each time, you'd put in $1000 and expect to get out $1000. Since A wins 60% of the time, you'd get 6 payouts and they would sum to $1000 (since you lose the bet on the 40% where A loses). Each bet would pay out $167, and subtracting off the initial $100 means a profit of $67. Thus, a $100 bet on A yields a profit of $67 when A wins which means that to get a profit of $100 you'd bet 100/.67, or $150. From the definition above, that means that a 60% chance of winning is a line of -150.<br /></p><p>Doing the same with the 40% one, you'd get 4 payouts that sum to $1000, so $250 per payout and a profit of $150. You bet $100, and profit $150 on a win, so the line is +150.</p><p>Thus, a 60/40 matchup corresponds to a line of -150/+150. The general equation for the logic above is:<br /></p><ul style="text-align: left;"><li><b>favorite: </b>American line = - 100/[( 1/percent - 1)]</li><li><b>underdog: </b>American line = 100*[( 1/percent - 1)]</li></ul><div>Going the other direction:</div><p></p><ul><li><b>favorite: </b>percent = 1/[(100/-American) + 1]</li><li><b>underdog: </b>percent = 1/[(American/100) + 1]</li></ul><div></div><p></p><h4 style="text-align: left;">Real Life</h4><div>It's not quite this easy. The person offering the bets (bookie) needs to make money. Imagine in the above that the person offering the bet wants to make $10 for every $100 bet. How does that change things?</div><div><br /></div><div>Consider bets on A. You make 10 bets on A. A wins 60% of the time, so you should get $1000 back like before except that you pay $10 per bet so you get $900 back. 6 payout that give $900 means $150 per payout and subtracting initial investment means $50 profit. That means you'd bet 100/0.5, or $200 for each $100 profit which means the line is -200.</div><div><br /></div><div>For B...4 payouts gets $900, so $225 payout per win which is $125 profit per win after subtracting initial investment. $125 profit on a $100 bet means +125 is the line.</div><div><br /></div><div>How can you factor out this $10 cost (margin)?</div><div><br /></div><div>You get a line of -200/+125 to start and want to see what the margin is on this. It's actually easy from what we did earlier. Simply convert these lines to the percentage versions, and add them together. Taking these specific numbers:</div><div><ul style="text-align: left;"><li>-200 => 66.67%</li><li>+125 => 44.44%</li><li>sum = 111.11%</li></ul><div>That is, for every $100 that is bet, the bookie gets $11.11 (or in the earlier terms, for every $90 that is bet you pay an additional $10).</div></div><div><br /></div><div>Finally...how do you get the implied chance of each option winning from odds that have the margin factored in like these? Simply divide each percentage by the sum.</div><div><ul style="text-align: left;"><li>favorite: 66.67% / 1.1111 = 60%</li><li>underdog: 44.44% / 1.1111 = 40%</li></ul><div>And we recovered the original odds.</div></div><div><br /></div><div><a href="https://docs.google.com/spreadsheets/d/1heJ4vmXLJVEiGkFBJskJk6NlgSITJOwtzfb-G3PkVMo/edit?usp=sharing">You can play with this in a spreadsheet here if you want.</a></div><div><br /></div><div><br /></div><p></p></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com13tag:blogger.com,1999:blog-1532419805701836386.post-81017374302570149512020-10-17T22:58:00.011-07:002020-10-18T21:24:52.597-07:00How Does Deal or No Deal Determine the Offers?If you've ever watched 'Deal or No Deal', you've likely wondered how the 'banker' determines his offers.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-ed-FUIx0__Q/X4vYb_FwxlI/AAAAAAAAHMU/t2QPK9lA0V002UhASIujXMLlra59SXlEgCLcBGAsYHQ/s1260/offer%2Bfrom%2Brandom%2Bforest.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="635" data-original-width="1260" src="https://1.bp.blogspot.com/-ed-FUIx0__Q/X4vYb_FwxlI/AAAAAAAAHMU/t2QPK9lA0V002UhASIujXMLlra59SXlEgCLcBGAsYHQ/s1600/offer%2Bfrom%2Brandom%2Bforest.png" width="0%" /></a></div><a name='more'></a><div>To play with this, I got data for 1 season (2006) of the US show from a <a href="https://www.aeaweb.org/articles?id=10.1257/aer.98.1.38">paper</a> on player behavior. For some obvious questions you might have...<br /><br /></div><h4 style="text-align: left;">Are the offers consistent?</h4><div>An immediate question I wanted to answer is 'does a player with a better board get a better offer?' The answer is 'usually', but there were many instances where this was not the case. Some specific examples...</div><div><ul style="text-align: left;"><li>round 8, player 1 had $50, $200, and $1,000,000 cases remaining and was offered $267,000</li><li>round 8, player 2 had $400, $1,000, and $1,000,000 cases remaining and was offered $215,000</li></ul><div>Player 2 clearly had the better board but received a much lower offer.</div></div><div><ul style="text-align: left;"><li>round 8, player 1 had $0.01, $400,000, and $750,000 cases remaining and was offered $375,000</li><li>round 8, player 2 had $25, $500,000, and $750,000 cases remaining and was offered $359,000</li></ul><div>Again, player 2 clearly had the better board but received a lower offer.</div></div><div><br /></div><div>Interestingly, if I restrict it to offers made in rounds 7 or 8 that were worth more than $100,000, 9 of 26 fit this pattern (player received an offer better than someone who had a better board). I'll cover another topic briefly then come back to this for speculation.</div><div><br /></div><h4 style="text-align: left;">Does the bank ever offer more than the board is worth?</h4><div>What is the board 'worth'? The most obvious answer is just the expectation value of the cases on the board. If you have 3 cases with $100, $500, and $1000 in them, the board's expectation value is ($100 + $500 + $1000)/3, or ~$533. You'd think it never makes sense for the banker to offer more than $533 for that board right?</div><div><br /></div><div>Turns out, for this season, 16 of the 62 round 7 and round 8 offers were for more than the board was worth using the definition above. Why would this ever make sense?</div><div><br /></div><h4 style="text-align: left;">Speculating so far</h4><div>There could just be a random noise generator weighting each offer to keep things interesting. There are some legit things that might explain the two questions above though that I can't answer confidently without more data. An idea that came to mind is basically...people get sad watching someone lose horribly, and the show might lose interest if that happens too often. </div><div><br /></div><div>In one instance of this pattern, a player had the following cases: $5, $75, and $400,000. That board is 'worth' ~$133,000, but the banker offered $137,000. It's sad if people in that situation commonly end up with $75, so the banker might offer more than it's worth just to keep that from happening.</div><div><br /></div><div>A sample safer board that got an offer below the board's worth had cases $200, $50,000, and $75,000 remaining. The offer was $35,000 for a board that's 'worth' ~$42,000. If the next case opened was $50,000, the player would receive an offer of $37,500 so there's no catastrophe. Thus, I think there's like a random component added to each offer that can be tuned up when they want to encourage the player to accept the offer.</div><div><br /></div><div>This might not be what's happening, but I can't think of a better reason for why the banker sometimes offers more than the expectation value of the board.</div><div><br /></div><div><h4 style="text-align: left;">Modeling the offers</h4><div>Now to answer the title question...can we reverse-engineer the model? Because of the above, I think it's impossible to get exactly. Further, I have no idea what type of model they're using. It could be a giant decision tree. It could be some random multiplier on the expectation value. It could be regression on, say, 5 parameters. I can get pretty close to the results though using the available information.</div><div><br /></div><div>To get it out of the way, here's what you get if you simply use the board's 'worth' from above:<br /><br /><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-oh2HbOuLfRY/X4vYJot0O3I/AAAAAAAAHMI/AdDUu8Obs-Y1pfDwl78M5IIpAYR2LJ19QCLcBGAsYHQ/s1260/offer%2Bis%2Bexpectation%2Bvalue.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="635" data-original-width="1260" src="https://1.bp.blogspot.com/-oh2HbOuLfRY/X4vYJot0O3I/AAAAAAAAHMI/AdDUu8Obs-Y1pfDwl78M5IIpAYR2LJ19QCLcBGAsYHQ/s1600/offer%2Bis%2Bexpectation%2Bvalue.PNG" width="90%" /></a></div><br /><div><br /></div><div>That's too simple to be exciting, so it's wrong.</div><div><br /></div><div>The most intuitive simple model to me thinking about this problem is:<br /><br />"Offer = constant * expectation value of board" where 'constant' is fixed based on the round</div><div><br /></div><div>Fitting the data to that, I get the following constants for each round (round # is list #):<br /><ol style="text-align: left;"><li>0.11</li><li>0.24</li><li>0.38</li><li>0.49</li><li>0.6</li><li>0.72</li><li>0.85</li><li>0.85</li><li>0.99</li></ol><div>This makes sense. In round 1, they don't want you to stop and there is a huge spread of outcomes left so the offer is so low that no one would accept it (11% of the board's value). By round 9, you have 2 cases left, so they just offer the average value of those two cases. In-between they build drama and steadily make the offers more attractive.</div></div><div><br /></div><div>How well does that predict the actual offers?</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-KhPCh7hLI1s/X4vYWHmv_GI/AAAAAAAAHMM/PIevBLVJscMJinCOxSV46PEUcsL_tQKsACLcBGAsYHQ/s1260/offer%2Bis%2Bweighted%2Bby%2Bround.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="635" data-original-width="1260" src="https://1.bp.blogspot.com/-KhPCh7hLI1s/X4vYWHmv_GI/AAAAAAAAHMM/PIevBLVJscMJinCOxSV46PEUcsL_tQKsACLcBGAsYHQ/s1600/offer%2Bis%2Bweighted%2Bby%2Bround.PNG" width="90%" /></a></div><br /><div><br /></div><div>That actually has an r^2 of ~95%, so we likely won't do much better with any sort of linear regression.<br /><br />Another approach is to try something like a random forest regressor. Using that with that weighted average, standard deviation of remaining case values, largest remaining case, smallest remaining case, and round number as features, I get:</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-ed-FUIx0__Q/X4vYb_FwxlI/AAAAAAAAHMU/t2QPK9lA0V002UhASIujXMLlra59SXlEgCLcBGAsYHQ/s1260/offer%2Bfrom%2Brandom%2Bforest.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="635" data-original-width="1260" src="https://1.bp.blogspot.com/-ed-FUIx0__Q/X4vYb_FwxlI/AAAAAAAAHMU/t2QPK9lA0V002UhASIujXMLlra59SXlEgCLcBGAsYHQ/s1600/offer%2Bfrom%2Brandom%2Bforest.png" width="90%" /></a></div><br /><div><br />That's a bit better, but it's likely overfit and I don't have enough data to split into large training and test sets. The simple weighted expectation value above works much better than I'd expected, so that's a decent model I think for this.</div><div><br /></div><div>One cool thing about the random forest regression is that you can get the importance of each feature. Those importances are:<br /><ul style="text-align: left;"><li>round-weighted expectation value: 0.96</li><li>standard deviation of remaining cases: 0.02</li></ul><div>and all the rest are less than 0.01. Running linear regression with spread included (so model is 'offer = C1*round-weighted average + C2*standard deviation of remaining cases) yields basically the same as just the round-weighted average:</div></div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-dAtd6EOrhs0/X4vYkxV4dvI/AAAAAAAAHMY/5C1r3ibs85EYy5TONjFSj6N4JT3oKAyagCLcBGAsYHQ/s1260/offer%2Bis%2Bregression%2Busing%2Bweighted%2Bby%2Bround%2Band%2Bspread.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="635" data-original-width="1260" src="https://1.bp.blogspot.com/-dAtd6EOrhs0/X4vYkxV4dvI/AAAAAAAAHMY/5C1r3ibs85EYy5TONjFSj6N4JT3oKAyagCLcBGAsYHQ/s1600/offer%2Bis%2Bregression%2Busing%2Bweighted%2Bby%2Bround%2Band%2Bspread.png" width="90%" /></a></div><br /><div><br /></div><h4 style="text-align: left;">Conclusion</h4><div>It looks like a simple model of 'offer some % of the average of the remaining cases where that % depends on current round' works well enough, and in reality they likely add some noise and probably alter it a bit as needed to keep interest/ratings up.</div><div><br /></div><div><br /></div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com3tag:blogger.com,1999:blog-1532419805701836386.post-17507847238905734292020-10-09T23:09:00.004-07:002020-10-10T22:07:36.753-07:00What Are the Most Common Scores in the NFL?My guess going into this is that both teams score around 30, a field goal is the most common separation, and it's some combination of 7 and 3. Taking a guess then, I think 27 - 24 is going to be most common.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-CKP9TsbOYgA/X4FMlR99e8I/AAAAAAAAHJQ/6gHuPOf4ce47HT_xRBTYmGc-A63zp7raQCLcBGAsYHQ/s1206/most%2Bcommon%2Bgame%2Bscores.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="672" data-original-width="1206" src="https://1.bp.blogspot.com/-CKP9TsbOYgA/X4FMlR99e8I/AAAAAAAAHJQ/6gHuPOf4ce47HT_xRBTYmGc-A63zp7raQCLcBGAsYHQ/s1600/most%2Bcommon%2Bgame%2Bscores.PNG" width="0%" /></a></div><a name='more'></a>For the initial analysis, I'll take what I think is the most literal definition of 'most common score' which is what score is most common for a single team across all games. For data, I'm using the 2000-2019 NFL seasons. Doing that, I get the following plot:<div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-DR_S63GZcIQ/X4FKe6nnRiI/AAAAAAAAHJA/rJURcmzN_iY3e5I7NpyzpVuDqBIHtgUNwCLcBGAsYHQ/s1209/most%2Bcommon%2Bteam%2Bscores.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="632" data-original-width="1209" src="https://1.bp.blogspot.com/-DR_S63GZcIQ/X4FKe6nnRiI/AAAAAAAAHJA/rJURcmzN_iY3e5I7NpyzpVuDqBIHtgUNwCLcBGAsYHQ/s1600/most%2Bcommon%2Bteam%2Bscores.PNG" width="100%" /></a></div><br /><div>So there's some obvious results here. No one ever scores only 1 point in the NFL because you can't. Scoring only 2 is super-rare since that means just getting a safety. Scoring 4 is even less common since that's only possible with 2 safeties. Getting into what's common...you'd expect x*7 + y*3 to be most common since touchdown (td) + extra point is 7 and field goal (fg) is 3. Looking at the most common ones, the top 5 are:<br /><br /><ul style="text-align: left;"><li>20 - can be done 2 common ways: 2 tds with extra points + 2 fgs, 3 tds with 1 missed extra point</li><li>17 - 2 tds with extra points + 1 fg</li><li>24 - 3 tds with extra points + 1 fg</li><li>27 - can be done 2 common ways: 3 tds with extra points + 2 fgs, 4 tds with 1 missed extra point</li><li>10 - 1 td with extra point + 1 fg</li></ul><div>Just scanning through, the oddest one to me is that 16 and 21 occur roughly as regularly. 16 is likely mostly 3 fgs + 1 td with extra point, but I would have guessed 21 (3 tds with extra points) was much more common.</div></div><div><br /></div><div>For another definition of 'most common score', I'll do the most common scores factoring in both teams in a game. That is, the most common final scores for NFL games. Doing that:<br /><br /><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-CKP9TsbOYgA/X4FMlR99e8I/AAAAAAAAHJQ/6gHuPOf4ce47HT_xRBTYmGc-A63zp7raQCLcBGAsYHQ/s1206/most%2Bcommon%2Bgame%2Bscores.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="672" data-original-width="1206" src="https://1.bp.blogspot.com/-CKP9TsbOYgA/X4FMlR99e8I/AAAAAAAAHJQ/6gHuPOf4ce47HT_xRBTYmGc-A63zp7raQCLcBGAsYHQ/s1600/most%2Bcommon%2Bgame%2Bscores.PNG" width="100%" /></a></div><br /><div><br /></div><div>Given the first plot, this shouldn't be surprising. The top 7 scores here include one of the top 5 scores from the previous plot. What is a bit surprising to me here is that 1-score games (difference less than 9) seem most common. Looking at that in more detail, here is the distribution of margin of victory for all games:</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Ms7X4F263Ng/X4FNJPS5zZI/AAAAAAAAHJg/9Tc1y1DVadUTzMWEtYBmy8U1myCfLPVGwCLcBGAsYHQ/s1238/most%2Bcommon%2Bmargin%2Bof%2Bvictory.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="606" data-original-width="1238" src="https://1.bp.blogspot.com/-Ms7X4F263Ng/X4FNJPS5zZI/AAAAAAAAHJg/9Tc1y1DVadUTzMWEtYBmy8U1myCfLPVGwCLcBGAsYHQ/s1600/most%2Bcommon%2Bmargin%2Bof%2Bvictory.PNG" width="100%" /></a></div><br /><div>No surprises from what we know from above...3 point difference is most common, and 7 point is next most common. On the '1-score game' note from above, doing some quick math, I get that ~50% of all NFL games in this period were decided by 1 score only (8 points or less).<br /><br />Also...in that last plot, you can see that the values on the right are larger than on the left. Since the right here means 'home team wins', this means that the home team wins more often. Adding it up, it turns out that over this 20-year period, the home team won ~58% of the time. Further, simply averaging the scores over this period, the home team averages ~2.5 points more points per game than the away team. Summarizing that, you could say that <b>the home field advantage in the NFL is roughly 2.5 points and leads to the home team winning ~58% of the time.</b></div><div><b><br /></b></div><div>That's it. Let me know if you want to see anything else with this data set.</div><div><br /></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-60252965114003652042020-09-27T21:55:00.005-07:002020-09-27T21:56:02.655-07:00Austin, TX Growth Measured by Tall Building ConstructionAustin has grown a lot lately...<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-r1H1LKTRrBM/X3FogzygkfI/AAAAAAAAHHM/SMSTlRRVdsESH1SGpV0cS93huttEU9JWgCLcBGAsYHQ/s1471/austin%2Bvs%2Bkc%2Bvs%2Bbham.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="635" data-original-width="1471" src="https://1.bp.blogspot.com/-r1H1LKTRrBM/X3FogzygkfI/AAAAAAAAHHM/SMSTlRRVdsESH1SGpV0cS93huttEU9JWgCLcBGAsYHQ/s1600/austin%2Bvs%2Bkc%2Bvs%2Bbham.PNG" width="0%" /></a></div><a name='more'></a><p>There are cranes everywhere, and it feels like there are way more than in other mid-size cities. I thought a cool proxy for growth might be construction of those, so I put together a few quick visualizations. 'Tall building' here is >200 feet.</p><p>First, here's the number of 'tall buildings' in Austin vs time:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-z2jDrBfxkao/X3FoYYvJ8YI/AAAAAAAAHHE/rR7YDX0ioocQiLSu6DVh3_7LA6K-6woTwCLcBGAsYHQ/s1471/austin.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="636" data-original-width="1471" src="https://1.bp.blogspot.com/-z2jDrBfxkao/X3FoYYvJ8YI/AAAAAAAAHHE/rR7YDX0ioocQiLSu6DVh3_7LA6K-6woTwCLcBGAsYHQ/s1600/austin.PNG" width="100%" /></a></div><p>That by itself doesn't tell you much. Here's it compared with a sort of comparable city...Kansas City:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-DmlomEpxQK4/X3FodKk6qxI/AAAAAAAAHHI/eComfOgXYo0YrlhQjfF0wiam_fJCCIbGwCLcBGAsYHQ/s1472/austin%2Bvs%2Bkc.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="636" data-original-width="1472" src="https://1.bp.blogspot.com/-DmlomEpxQK4/X3FodKk6qxI/AAAAAAAAHHI/eComfOgXYo0YrlhQjfF0wiam_fJCCIbGwCLcBGAsYHQ/s1600/austin%2Bvs%2Bkc.PNG" width="100%" /></a></div><p>Austin's spike in the 2010's is really noticeable when plotted together. Just adding one more, here's the same with Birmingham, Alabama also included:<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-r1H1LKTRrBM/X3FogzygkfI/AAAAAAAAHHM/SMSTlRRVdsESH1SGpV0cS93huttEU9JWgCLcBGAsYHQ/s1471/austin%2Bvs%2Bkc%2Bvs%2Bbham.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="635" data-original-width="1471" src="https://1.bp.blogspot.com/-r1H1LKTRrBM/X3FogzygkfI/AAAAAAAAHHM/SMSTlRRVdsESH1SGpV0cS93huttEU9JWgCLcBGAsYHQ/s1600/austin%2Bvs%2Bkc%2Bvs%2Bbham.PNG" width="100%" /></a></div><br /><div>I wanted to include Shanghai to show a whole different scale of rapid growth, but I can't find any lists that have buildings under 500 feet tall for it.</div><div><br /></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-6279290909607070072020-09-23T20:22:00.009-07:002020-09-23T22:50:14.777-07:00Where Do Football Plays Occur?Simple question...where does each play start in the NFL?<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-hITjSt1YYl0/X2wPpGgJ4XI/AAAAAAAAHGc/Dh0TLM__Wh89lbDJavosd-Ndq4U5o1axQCLcBGAsYHQ/s1666/yards%2Bto%2Bgoal%2Bat%2Bstart%2Bof%2Beach%2Blpay.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1104" data-original-width="1666" src="https://1.bp.blogspot.com/-hITjSt1YYl0/X2wPpGgJ4XI/AAAAAAAAHGc/Dh0TLM__Wh89lbDJavosd-Ndq4U5o1axQCLcBGAsYHQ/s1666/yards%2Bto%2Bgoal%2Bat%2Bstart%2Bof%2Beach%2Blpay.png" width="0%" /></a></div><a name='more'></a>To determine this, I'm using <a href="https://github.com/ryurko/nflscrapR-data/tree/master/play_by_play_data/regular_season">this</a> play-by-play data set for 2018 and 2019. From there, I just counted how often each position on the field was the position of an offensive play (e.g., kickoffs are not included). The height of the bar is just the % of plays that started at that position (read right to left here...starting on your own 25 means starting on the rightmost 25 in the plot here):<br /><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-hITjSt1YYl0/X2wPpGgJ4XI/AAAAAAAAHGc/Dh0TLM__Wh89lbDJavosd-Ndq4U5o1axQCLcBGAsYHQ/s1666/yards%2Bto%2Bgoal%2Bat%2Bstart%2Bof%2Beach%2Blpay.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1104" data-original-width="1666" src="https://1.bp.blogspot.com/-hITjSt1YYl0/X2wPpGgJ4XI/AAAAAAAAHGc/Dh0TLM__Wh89lbDJavosd-Ndq4U5o1axQCLcBGAsYHQ/s1666/yards%2Bto%2Bgoal%2Bat%2Bstart%2Bof%2Beach%2Blpay.png" width="100%" /></a></div><br /><div>No surprises here...most common is to start after a touchback (own 25), and it's really rare to start inside your own 10 (since those often become touchbacks). There's also a big spike in plays at the goalline because goalline stands happen slightly more often than plays from, say, the 6 yard line.<br /><br />It is a bit interesting that there is a slight bump for most multiples of 5 yards. I'm not really sure why. It could be an impact from many penalties being multiples of 5 yards. It could be something like ref spots just being slightly biased to 5 yard increments. It could be that first downs are 10 yard increments and they start at 25 more often than other locations so it is biased by that. If someone has a better idea please post in the comments because I'm curious now.</div><div><br /></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0