Jekyll2020-06-28T20:03:00+05:30https://hackeregg.github.io/feed.xmlHackerEggLearning, Understanding and Implementing Computer ScienceImplementing a Simple Redirection Analytics with a touch of Data Science2020-06-03T20:45:48+05:302020-06-03T20:45:48+05:30https://hackeregg.github.io/2020/06/03/Implementing-a-Simple-Redirection-Analytics<h2 id="introduction">Introduction</h2>
<p>We are aiming to build a simple redirection analytics system similar to bit.ly / shorturl using opensource toolset with further investigation on the scope of data science in it.</p>
<p>Let take an example of a simple food review website. On the Website, users can see reviews of different dishes. If Users like a review of a particular dish and want to order it, the Website contains links to various food ordering platforms (such as Uber eats, Zomato, Swiggy). Users can click on one of the platforms and it will be redirected to the chosen platform with desired dish.</p>
<p>For the redirection, we simply embed the URL with the relevant link and integrate it into our website.</p>
<p>Now, We want to check how many users clicked on that link. We can do this by adding a URL shortener or redirection service [ <a href="https://bitly.com/">https://bitly.com/</a>, <a href="https://www.shorturl.at/">https://www.shorturl.at/</a> ]. This process can be manual and costly. Many statistics of redirection are generally paid. Although, most of the services give much-needed statistics for free.</p>
<blockquote>
<p>One of the real-world examples of this feature is Twitter. When the user includes URL into his/her tweet it gets automatically replaced by a short URL (t.co/…”) and the user can see how many people clicked on that link. Even LinkedIn does this.</p>
</blockquote>
<p>Here, we will mention the simplest way to implement this feature in Flask.</p>
<p><strong>Simplest doesn’t mean best.</strong></p>
<p>We will need to store all relevant links to all dishes with all supported platforms. We might store this in one big database table. But here, we will use CSV in place of the database for storing the dishes and their respective URLs. Following is the format of the CSV :</p>
<table>
<thead>
<tr>
<th>Dishname</th>
<th>Restaurant</th>
<th>Platform</th>
<th>URL</th>
</tr>
</thead>
<tbody>
<tr>
<td>Pizza</td>
<td>Uncle John’s</td>
<td>Zomato</td>
<td>zomota.com/something</td>
</tr>
<tr>
<td>Pasta</td>
<td>Uncle Bob’s</td>
<td>Uber Eats</td>
<td>ubereats.com/something</td>
</tr>
</tbody>
</table>
<p>We will use pandas for reading CSV. ( Note: we can use the CSV library of python standard package, too. But We will need some functionality to search for URLs. We can implement this search functionality easily with pandas)</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">"database.csv"</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="redirection-implementation-in-flask">Redirection Implementation in Flask</h2>
<p>Now, We have redirection urls. Let’s take a look at the following code for redirection implementation.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">@</span><span class="n">app</span><span class="p">.</span><span class="n">route</span><span class="p">(</span><span class="s">'/redirect/<restaurant>/<dish>/<platform>'</span><span class="p">,</span> <span class="n">methods</span> <span class="o">=</span> <span class="p">[</span><span class="s">'GET'</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">redirect_stat</span><span class="p">(</span><span class="n">dish</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span><span class="n">restaurant</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span><span class="n">platform</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="n">url_redirect</span> <span class="o">=</span> <span class="s">'/'</span>
<span class="k">if</span> <span class="n">dish</span> <span class="ow">and</span> <span class="n">restaurant</span> <span class="ow">and</span> <span class="n">platform</span><span class="p">:</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="n">query</span><span class="p">(</span><span class="s">f'dishname=="</span><span class="si">{</span><span class="n">dish</span><span class="si">}</span><span class="s">" & restaurant=="</span><span class="si">{</span><span class="n">restaurant</span><span class="si">}</span><span class="s">" '</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">result</span><span class="p">)</span><span class="o">></span><span class="mi">0</span><span class="p">:</span>
<span class="k">if</span> <span class="n">platform</span><span class="o">==</span><span class="s">'zomato'</span> <span class="ow">or</span> <span class="n">platform</span><span class="o">==</span><span class="s">'swiggy'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">redirect</span><span class="p">(</span><span class="n">result</span><span class="p">[</span><span class="s">'url_'</span><span class="o">+</span><span class="n">platform</span><span class="p">].</span><span class="n">iloc</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">code</span><span class="o">=</span><span class="mi">302</span><span class="p">)</span>
<span class="k">return</span> <span class="n">redirect</span><span class="p">(</span><span class="n">url_redirect</span><span class="p">,</span><span class="n">code</span><span class="o">=</span><span class="mi">302</span><span class="p">)</span>
</code></pre></div></div>
<p>As we can see, We have defined a very simple URL routing scheme in Flask for redirection. If someone wants to order X dish from Y restaurant on the Z platform, they have to just type the following URL in their browser “ourwebsite.com/redirect/X/Y/Z” and we will redirect them to the relevant platform with that dish. Code also does a few additional checks making sure that all arguments are passed correctly and only supported platforms are allowed for redirection.</p>
<p>Now, We need to replace all URLs on our website with this routing scheme. Depending upon how we have implemented this on our website this can take from couple of minutes to a week.</p>
<h2 id="storing-data-to-database">Storing Data to Database</h2>
<p>We need to store redirection entries. CSV? Yes, We can. But. multiple threads writing to single CSV doesn’t end up well in the long run. So, We’ll use a real database. MongoDB? No, It’s PostgreSQL. We will create a simple table for storing our information.</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">IF</span> <span class="k">NOT</span> <span class="k">EXISTS</span> <span class="k">public</span><span class="p">.</span><span class="n">REDIRECTS</span>
<span class="p">(</span><span class="n">IP</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">DISH_NAME</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">RESTAURANT</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">DATE_TIME</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">PLATFORM</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">);</span>
</code></pre></div></div>
<p>We also can create table using python.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="bp">self</span><span class="p">.</span><span class="n">conn</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">try</span><span class="p">:</span>
<span class="c1"># read connection parameters
</span> <span class="n">params</span> <span class="o">=</span> <span class="n">config</span><span class="p">(</span><span class="n">section</span> <span class="o">=</span> <span class="n">section</span><span class="p">)</span>
<span class="c1"># connect to the PostgreSQL server
</span> <span class="k">print</span><span class="p">(</span><span class="s">'Connecting to the PostgreSQL database...'</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">conn</span> <span class="o">=</span> <span class="n">psycopg2</span><span class="p">.</span><span class="n">connect</span><span class="p">(</span><span class="o">**</span><span class="n">params</span><span class="p">)</span>
<span class="k">if</span> <span class="n">create_database</span><span class="p">:</span>
<span class="n">cur</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">conn</span><span class="p">.</span><span class="n">cursor</span><span class="p">()</span>
<span class="n">database_create_query</span> <span class="o">=</span> <span class="s">'''CREATE TABLE IF NOT EXISTS public.REDIRECTS
(IP TEXT NOT NULL,
DISH_NAME TEXT NOT NULL,
RESTAURANT TEXT NOT NULL,
DATE_TIME TEXT NOT NULL,
PLATFORM TEXT NOT NULL); '''</span>
<span class="n">cur</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="n">database_create_query</span><span class="p">)</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">cur</span><span class="p">.</span><span class="n">fetchall</span><span class="p">()</span>
<span class="bp">self</span><span class="p">.</span><span class="n">conn</span><span class="p">.</span><span class="n">commit</span><span class="p">()</span>
<span class="n">cur</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
<span class="k">print</span> <span class="p">(</span><span class="s">'Connected'</span><span class="p">)</span>
<span class="k">except</span> <span class="p">(</span><span class="nb">Exception</span><span class="p">,</span> <span class="n">psycopg2</span><span class="p">.</span><span class="n">DatabaseError</span><span class="p">)</span> <span class="k">as</span> <span class="n">error</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="n">error</span><span class="p">)</span>
</code></pre></div></div>
<p>Now, we have a table in place. We just need to insert data and everything is almost done. Following code implements insertion in the database.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">@</span><span class="n">app</span><span class="p">.</span><span class="n">route</span><span class="p">(</span><span class="s">'/redirect/<restaurant>/<dish>/<platform>'</span><span class="p">,</span> <span class="n">methods</span> <span class="o">=</span> <span class="p">[</span><span class="s">'GET'</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">redirect_stat</span><span class="p">(</span><span class="n">dish</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span><span class="n">restaurant</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span><span class="n">platform</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="n">url_redirect</span> <span class="o">=</span> <span class="s">'/'</span>
<span class="k">if</span> <span class="n">dish</span> <span class="ow">and</span> <span class="n">restaurant</span> <span class="ow">and</span> <span class="n">platform</span><span class="p">:</span>
<span class="c1"># inserting entry into database
</span> <span class="n">redirectdb</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="n">ip</span><span class="o">=</span><span class="n">request</span><span class="p">.</span><span class="n">remote_addr</span><span class="p">,</span><span class="n">datetime</span><span class="o">=</span><span class="n">current_time</span><span class="p">(),</span>
<span class="n">dish</span><span class="o">=</span><span class="n">dish</span><span class="p">,</span> <span class="n">restaurant</span><span class="o">=</span><span class="n">restaurant</span><span class="p">,</span> <span class="n">platform</span><span class="o">=</span><span class="n">platform</span><span class="p">)</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="n">query</span><span class="p">(</span><span class="s">f'dishname=="</span><span class="si">{</span><span class="n">dish</span><span class="si">}</span><span class="s">" & restaurant=="</span><span class="si">{</span><span class="n">restaurant</span><span class="si">}</span><span class="s">" '</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">result</span><span class="p">)</span><span class="o">></span><span class="mi">0</span><span class="p">:</span>
<span class="k">if</span> <span class="n">platform</span><span class="o">==</span><span class="s">'zomato'</span> <span class="ow">or</span> <span class="n">platform</span><span class="o">==</span><span class="s">'swiggy'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">redirect</span><span class="p">(</span><span class="n">result</span><span class="p">[</span><span class="s">'url_'</span><span class="o">+</span><span class="n">platform</span><span class="p">].</span><span class="n">iloc</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">code</span><span class="o">=</span><span class="mi">302</span><span class="p">)</span>
<span class="k">return</span> <span class="n">redirect</span><span class="p">(</span><span class="n">url_redirect</span><span class="p">,</span><span class="n">code</span><span class="o">=</span><span class="mi">302</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="analytics">Analytics</h2>
<p>At this point, we have data in our Postgres. We can do so many things with this just one simple table. We might need to do ETL depending upon what we trying to do with this data but I will put a few ideas for you to try out.</p>
<ol>
<li>
<p>How many total redirections from our website happened last month?</p>
<p>Note we can not accurately calculate number of people who were redirected because many people might be using VPNs. But, We can roughly estimate this by counting unique IPs. It might have an error in it but we can still benefit from it.</p>
</li>
<li>Which platform is popular?</li>
<li>
<p>Which dishes are popular and which restaurants are popular?</p>
<p>If we have additional information about restaurants in our database. For example, location of every restaurant, we can even tell which restaurant are popular in X city?</p>
</li>
</ol>
<p>Here, I have used popular word because We never know redirected user placed an order on the platform or not. So, We should refrain from making such claims.</p>
<blockquote>
<p>Few words on privacy: We implemented all this without using a single third-party tool. Our website users’ data is at our hands. We can even drop the IP column form our table and only track general statistics. This is the best way to maintain our user’s privacy rather than spending a few bucks on custom analytics tools that we don’t have full control of.</p>
</blockquote>
<p>At this point, You have gotten the idea of how much powerful one simple table can be. But, If you’re coming from the Machine learning or Data Science domain, There are few eccentric ideas. You should only try to implement this if you have at least more than X amount of daily active users on your website. If not, better invest time to implement other important features.</p>
<ol>
<li>
<p>Based on Data, Predicting URL redirection for a particular restaurant for the upcoming day.</p>
<p>Yes, You read this right. We can do this using time series analysis and enough data.</p>
</li>
<li>Based on IP addresses, We can generate location-based statistics.</li>
<li>
<p>Detecting bots’ IP address, if any exists on our platform, from redirection patterns.</p>
<p>Bots have unique patterns. Sometimes they try to emulate person but no user will do 50 URL redirection on our website in one day. It can happen in rare case but we can look at general statistics from our records to determine this behaviour.</p>
</li>
</ol>
<h2 id="optimization">Optimization</h2>
<p>If you have seen a real-world production system. You might already have figured it out that how to optimize this approach. I will just mention a few points here.</p>
<ol>
<li>We can tighten our attributes of the table in the schema. Use Datetime for storing DateTime stamp rather than text. For IP address, You can follow this post on <a href="https://stackoverflow.com/questions/2542011/most-efficient-way-to-store-ip-address-in-mysql">stack overflow</a></li>
<li>We have Write heavy system. Our Database might face issue if we have X amount of user redirecting every second. We can scale this because We don’t need strong consistency. Yay!
We can deal with eventual consistency. Also, We are not reading from the database at all. We just occasionally read from DB for our analytics purpose. The simplest and must option is batching our writes. You can read more on optimizing DB for write heavy task. This topic deserves a blog in itself.</li>
<li>Our search time for url in CSV is linear [ O(N) ]. We can speed up this by using a hashmap. We can even use Redis or Memcache. If we have too many URLs, we will not be using CSV but a real database. For a large scale, indexes in DB + single in-memory cache instance (Redis/Memcache) is more than enough to scale to at-least to million urls.</li>
</ol>
<p><strong>written by Dipkumar Patel and Yash Panchal</strong></p>IntroductionSpeeding up function calls with just one line in Python2020-06-03T20:45:48+05:302020-06-03T20:45:48+05:30https://hackeregg.github.io/2020/06/03/Speeding-up-function-calls-with-just-one-line-in-Python<p>One line summary: Use <a href="https://docs.python.org/3/library/functools.html#functools.lru_cache">lru_cache decorator</a></p>
<h3 id="caching">Caching</h3>
<p>If we’re calling expensive functions in the program very frequently, It’s best to save the result of a function call and use it for future purposes rather than calling function every time. This will generally speed up the execution of the program.</p>
<blockquote>
<p>The expensiveness of function can be in terms of computational (CPU usage) or latency (disk read, fetching a resource from the network).</p>
</blockquote>
<p>The saving result of function calls is generally referred to as caching. The naive way to do caching is to store every function calls. But, this doesn’t scale very well with the number of parameters of function and range of each parameter.</p>
<p>So, we need a smart way to do caching with a fixed amount of memory. And, there are plenty of <a href="https://en.wikipedia.org/wiki/Cache_replacement_policies">caching strategies available</a> depending upon what type of information is available to us.</p>
<blockquote>
<p>Caching is heavily used in plenty of areas from low-level (hardware/CPU) to high level (network/CDNs).</p>
</blockquote>
<p>In most of the languages, We will choose caching strategies of our choice and implement them using a few data structures (hashmap, priority queue). Depending upon the language, It might take as little as few minutes to few hours to implement the generic solution of our need.</p>
<p>But, Python’s standard library <a href="https://docs.python.org/3/library/functools.html">functools</a> already comes with one strategy of caching called <a href="https://docs.python.org/3/library/functools.html#functools.lru_cache">LRU(Least Recently Used)</a>. Thanks to <a href="https://wiki.python.org/moin/PythonDecorators">decorators</a> in python, It only takes one line to integrate into the existing codebase</p>
<h3 id="basic-recursive-implementation-of-fibonacci-numbers">Basic Recursive Implementation of <a href="https://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci numbers</a></h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">time</span> <span class="k">as</span> <span class="n">tt</span>
<span class="k">def</span> <span class="nf">fib</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="k">if</span> <span class="n">n</span> <span class="o"><=</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="n">n</span>
<span class="k">return</span> <span class="n">fib</span><span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="n">fib</span><span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span>
<span class="n">t1</span> <span class="o">=</span> <span class="n">tt</span><span class="p">.</span><span class="n">time</span><span class="p">()</span>
<span class="n">fib</span><span class="p">(</span><span class="mi">30</span><span class="p">)</span>
<span class="k">print</span> <span class="p">(</span><span class="s">f"Time taken: </span><span class="si">{</span><span class="n">tt</span><span class="p">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">t1</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="c1"># Output :
# Time taken: 0.3209421634674072
</span></code></pre></div></div>
<h3 id="speeding-up-recursive-implementation-with-lru">Speeding Up Recursive Implementation with LRU</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">time</span> <span class="k">as</span> <span class="n">tt</span>
<span class="kn">import</span> <span class="nn">functools</span>
<span class="c1"># saving all function calls
</span><span class="o">@</span><span class="n">functools</span><span class="p">.</span><span class="n">lru_cache</span><span class="p">(</span><span class="n">maxsize</span><span class="o">=</span><span class="mi">31</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">fib</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="k">if</span> <span class="n">n</span> <span class="o"><=</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="n">n</span>
<span class="k">return</span> <span class="n">fib</span><span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="n">fib</span><span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span>
<span class="n">t1</span> <span class="o">=</span> <span class="n">tt</span><span class="p">.</span><span class="n">time</span><span class="p">()</span>
<span class="n">fib</span><span class="p">(</span><span class="mi">30</span><span class="p">)</span>
<span class="k">print</span> <span class="p">(</span><span class="s">f"Time taken: </span><span class="si">{</span><span class="n">tt</span><span class="p">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">t1</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="k">print</span> <span class="p">(</span><span class="n">fib</span><span class="p">.</span><span class="n">cache_info</span><span class="p">())</span>
<span class="c1"># Output :
# Time taken: 1.7881393432617188e-05
# CacheInfo(hits=28, misses=31, maxsize=31, currsize=31)
</span></code></pre></div></div>
<p>In this example, we have saved all function calls. But, We know that Fibonacci can be implemented using <a href="https://en.wikipedia.org/wiki/Dynamic_programming">DP</a>.</p>
<h3 id="iterative-implementation-of-fibonacci">Iterative implementation of Fibonacci</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">time</span> <span class="k">as</span> <span class="n">tt</span>
<span class="k">def</span> <span class="nf">fib_iterative</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="k">if</span> <span class="n">n</span> <span class="o"><=</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="n">n</span>
<span class="n">f</span><span class="p">,</span> <span class="n">s</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">f</span> <span class="o">+</span> <span class="n">s</span>
<span class="n">f</span><span class="p">,</span> <span class="n">s</span> <span class="o">=</span> <span class="n">s</span><span class="p">,</span> <span class="n">t</span>
<span class="k">return</span> <span class="n">t</span>
<span class="n">t1</span> <span class="o">=</span> <span class="n">tt</span><span class="p">.</span><span class="n">time</span><span class="p">()</span>
<span class="n">fib_iterative</span><span class="p">(</span><span class="mi">30</span><span class="p">)</span>
<span class="k">print</span> <span class="p">(</span><span class="s">f"Time taken: </span><span class="si">{</span><span class="n">tt</span><span class="p">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">t1</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="c1"># Output:
# Time taken: 5.0067901611328125e-06
</span></code></pre></div></div>
<h3 id="different-cache-size">Different Cache size</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">time</span> <span class="k">as</span> <span class="n">tt</span>
<span class="kn">import</span> <span class="nn">functools</span>
<span class="k">def</span> <span class="nf">lru_size</span><span class="p">(</span><span class="n">max_lru</span><span class="p">):</span>
<span class="o">@</span><span class="n">functools</span><span class="p">.</span><span class="n">lru_cache</span><span class="p">(</span><span class="n">maxsize</span><span class="o">=</span><span class="n">max_lru</span><span class="p">,</span> <span class="n">typed</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">fib_lru</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="k">if</span> <span class="n">n</span> <span class="o"><=</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="n">n</span>
<span class="k">return</span> <span class="n">fib_lru</span><span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="n">fib_lru</span><span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span>
<span class="k">return</span> <span class="n">fib_lru</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">31</span><span class="p">]:</span>
<span class="n">t1</span> <span class="o">=</span> <span class="n">tt</span><span class="p">.</span><span class="n">time</span><span class="p">()</span>
<span class="n">fib</span> <span class="o">=</span> <span class="n">lru_size</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="n">fib</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="k">print</span> <span class="p">(</span><span class="s">f"LRU size: </span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s"> Time taken: </span><span class="si">{</span><span class="n">tt</span><span class="p">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">t1</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="k">print</span> <span class="p">(</span><span class="n">fib</span><span class="p">.</span><span class="n">cache_info</span><span class="p">())</span>
<span class="c1"># Output:
# LRU size: 1 Time taken: 0.6930997371673584
# CacheInfo(hits=0, misses=2692537, maxsize=1, currsize=1)
# LRU size: 2 Time taken: 0.012731075286865234
# CacheInfo(hits=8656, misses=41641, maxsize=2, currsize=2)
# LRU size: 5 Time taken: 5.817413330078125e-05
# CacheInfo(hits=28, misses=31, maxsize=5, currsize=5)
# LRU size: 10 Time taken: 3.9577484130859375e-05
# CacheInfo(hits=28, misses=31, maxsize=10, currsize=10)
# LRU size: 31 Time taken: 3.504753112792969e-05
# CacheInfo(hits=28, misses=31, maxsize=31, currsize=31)
</span></code></pre></div></div>
<p>As, <strong>we can see the optimal cache size of fib function is 5</strong>. Increasing cache size will not result in much gain in terms of speedup.</p>
<h3 id="important-note">Important Note</h3>
<p>I strictly suggest to use lru decorator in only deterministic functions.</p>
<h4 id="deterministic-functions">Deterministic Functions</h4>
<blockquote>
<p>In computer science, a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states. Deterministic algorithms are by far the most studied and familiar kind of algorithm, as well as one of the most practical, since they can be run on real machines efficiently.</p>
<p>– Wikipedia</p>
</blockquote>
<p>Because,</p>
<blockquote>
<p>There are only two hard things in Computer Science: cache invalidation and naming things.</p>
<p>– Phil Karlton</p>
</blockquote>One line summary: Use lru_cache decorator