<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><atom:link rel="hub" href="http://tumblr.superfeedr.com/" xmlns:atom="http://www.w3.org/2005/Atom"/><description>Human Being | BI &amp; Social Media enthusiast | Voracious Reader | Amateur Photographer</description><title>The Beautiful Mind</title><generator>Tumblr (3.0; @nellaikanth)</generator><link>http://nellaikanth.tumblr.com/</link><item><title>How can I start preparing for GSoC 2014? As of now I am a novice coder.</title><description>&lt;p&gt;Answer by Rajath Shashidhara:&lt;/p&gt;&lt;blockquote&gt;According to me, &lt;br/&gt;being a GSoC&amp;#8217;er is about&lt;b&gt; grasping code quickly &lt;/b&gt;and &lt;b&gt;learning independently. &lt;/b&gt;&lt;br/&gt;Follow these steps:&lt;br/&gt;&lt;br/&gt;1. Install Linux. Master using the shell. Learn all the things on you own. Google stuff. &lt;br/&gt;2. Choose a programming language. Work on it.Understand the general structure of it. In the sense that, if you are given a new set of API tomorrow and asked to code using it, you should be able to read the documentation of the api and use it in your code. &lt;br/&gt;3. Once you have mastered the art of independent code learning, choose an opensource organization. Subscribe to their mailing list. Introduce yourself as a newbie to opensource development. Don&amp;#8217;t be shy. My experience with opensource community is that they are very supportive. They are very helpful. You will experience a few frustrating days. Don&amp;#8217;t give up. Setup the development environment. Then browse the buglist for bugs that catch your eye, thats looks like that can be reproduced by you. Browse the code, look for something called devguide which gives the purpose of each variable or method. Then try to relate the code to the bug. Get help from the community at each step. They will guide you. At times this can get very frustrating, be focused. Get used to bug fixing. If at any point of time, if you feel that you are not enjoying this challenge, then probably you should rethink about it and make a wise decision. This way you can get used to bug fixing. Version control is another important thing. You should learn about them and use them. Once you have done this you have reached a higher level. Now you are an opensource contributer. Congrats! Bond with the developer community. &lt;br/&gt;&lt;br/&gt;Choosing an organization must be done smartly. If you have done the above things for a considerable amount of time, you have established as an opensource developer. Then, getting into GSoC will not be difficult at all.Your community will be happy to accept you . :)&lt;br/&gt;&lt;br/&gt;Read my blog post: My first step into OpenSource Development. &lt;span class="qlink_container"&gt;&lt;a href="http://rajaths589.wordpress.com" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "wordpress.com")'&gt;My Explorations&lt;/a&gt;&lt;/span&gt; for the detailed experience.&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Google-Summer-of-Code-GSoC/How-can-I-start-preparing-for-GSoC-2014-As-of-now-I-am-a-novice-coder/answer/Rajath-Shashidhara"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/51734884900</link><guid>http://nellaikanth.tumblr.com/post/51734884900</guid><pubDate>Thu, 30 May 2013 13:32:25 -0400</pubDate></item><item><title>How is asynchronous IO implemented in programming languages?</title><description>&lt;p&gt;Answer by Prakash Gamit:&lt;blockquote&gt;There are five types of I/O models&lt;br/&gt;&lt;br/&gt;Diagrams are for network I/O, but they would be similar for disk I/O or any other I/O operation.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;1&lt;/b&gt;. &lt;b&gt;Blocking I/O model&lt;/b&gt;&lt;br/&gt;Block if request cannot be completed immediately.&lt;br/&gt;&lt;div&gt;&lt;img class="qtext_image zoomable_in zoomable_in_feed" src="http://qph.is.quoracdn.net/main-qimg-59331e0f8330d71dd06aca1eaf12761c" master_src="http://qph.is.quoracdn.net/main-qimg-5ac7b97f556b92dc84efe1b97f71da97" master_w="871" master_h="520"/&gt;&lt;/div&gt;&lt;br/&gt;&lt;b&gt;2&lt;/b&gt;. &lt;b&gt;Nonblocking I/O model&lt;/b&gt;&lt;br/&gt;Do not block if request cannot be completed immediately, return error(&lt;div class="codeblock inline_codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;EWOULDBLOCK&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;) instead.&lt;br/&gt;Setting &lt;div class="codeblock inline_codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;O_NONBLOCK&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt; flag using &lt;div class="codeblock inline_codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;fcntl&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;br/&gt;&lt;div&gt;&lt;img class="qtext_image zoomable_in zoomable_in_feed" src="http://qph.is.quoracdn.net/main-qimg-2cf34210e4e7e80ab8997e751b6cb7f6" master_src="http://qph.is.quoracdn.net/main-qimg-6043fb09e7ce42274c5975464e904dd6" master_w="863" master_h="513"/&gt;&lt;/div&gt;This is often a waste of CPU time as it uses polling.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;3&lt;/b&gt;. &lt;b&gt;I/O Multiplexing model&lt;/b&gt;&lt;br/&gt;Call &lt;div class="codeblock inline_codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;select / poll&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;i&gt; &lt;/i&gt;and block in one of these system calls, instead of blocking in the actual I/O system call.&lt;br/&gt;&lt;div&gt;&lt;img class="qtext_image zoomable_in zoomable_in_feed" src="http://qph.is.quoracdn.net/main-qimg-cae6ecfcd5088647bb13cdb6a28c94db" master_src="http://qph.is.quoracdn.net/main-qimg-e0c3c4813031605f6be776b8311bbe20" master_w="881" master_h="501"/&gt;&lt;/div&gt;&lt;br/&gt;&lt;br/&gt;&lt;b&gt;4&lt;/b&gt;. &lt;b&gt;Signal-driven I/O model&lt;/b&gt;&lt;br/&gt;Tell kernel to notify application with &lt;div class="codeblock inline_codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;SIGIO&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt; signal when the descriptor is ready.&lt;br/&gt;&lt;div&gt;&lt;img class="qtext_image zoomable_in zoomable_in_feed" src="http://qph.is.quoracdn.net/main-qimg-1c70f89e831348d98f41271fd32a9c83" master_src="http://qph.is.quoracdn.net/main-qimg-a53ce7da0670e41b3213d67d45c963ef" master_w="849" master_h="507"/&gt;&lt;/div&gt;&lt;br/&gt;&lt;hr class="qtext_hr"&gt;&lt;br/&gt;&lt;b&gt;5&lt;/b&gt;. &lt;b&gt;Asynchronous I/O model&lt;/b&gt;&lt;br/&gt;telling the kernel to start the operation and to notify application when the entire operation (including the copy of data from the kernel to application buffer) is complete.&lt;br/&gt;&lt;div&gt;&lt;img class="qtext_image zoomable_in zoomable_in_feed" src="http://qph.is.quoracdn.net/main-qimg-55b933e77c680c008204cf7799e6bc67" master_src="http://qph.is.quoracdn.net/main-qimg-12f933673e133866898a4a43a16c320e" master_w="836" master_h="507"/&gt;&lt;/div&gt;Following function are available in C for asynchronous I/O in Linux 3.5 system(might be available in earlier versions also)&lt;br/&gt;&lt;br/&gt;&lt;b&gt;aio_read(3) &lt;/b&gt;    Enqueue  a read request. This is the asynchronous analog of &lt;b&gt;read(2)&lt;/b&gt;.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;aio_write(3) &lt;/b&gt;   Enqueue a write request. This is the asynchronous analog of &lt;b&gt;write(2)&lt;/b&gt;.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;aio_fsync(3)&lt;/b&gt;    Enqueue a sync request for the I/O operations on a file descriptor. This is the asynchronous analog of &lt;b&gt;fsync(2)&lt;/b&gt; and &lt;b&gt;fdatasync(2)&lt;/b&gt;.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;aio_error(3) &lt;/b&gt;   Obtain the error status of an enqueued I/O request.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;aio_return(3) &lt;/b&gt;  Obtain the return status of a completed I/O request.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;aio_suspend(3)&lt;/b&gt;  Suspend the caller until one or more of a specified set of I/O requests completes.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;aio_cancel(3)&lt;/b&gt;   Attempt to cancel outstanding I/O requests on a  specified file descriptor.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;lio_listio(3)&lt;/b&gt;   Enqueue  multiple I/O requests using a single function call.&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;The &lt;b&gt;aiocb&lt;/b&gt; (&amp;#8220;asynchronous I/O control block&amp;#8221;) structure defines  parameters  that  control  an  I/O  operation. An  argument of this type is employed with all of the functions listed above. This  structure has the following form:&lt;br/&gt;&lt;br/&gt;&lt;table class="codeblocktable"&gt;&lt;tr&gt;&lt;td class="linenos"&gt;&lt;pre&gt; 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
&lt;/pre&gt;&lt;/td&gt;&lt;td class="code"&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;#include &amp;lt;aiocb.h&amp;gt;&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;           struct aiocb {&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;               /* The order of these fields is implementation-dependent */&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;               int             aio_fildes;     /* File descriptor */&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;               off_t           aio_offset;     /* File offset */&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;               volatile void  *aio_buf;        /* Location of buffer */&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;               size_t          aio_nbytes;     /* Length of transfer */&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;               int             aio_reqprio;    /* Request priority */&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;               struct sigevent aio_sigevent;   /* Notification method */&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;               int             aio_lio_opcode; /* Operation to be performed;&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;                                                  lio_listio() only */&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;               /* Various implementation-internal fields not shown */&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;           };&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;           /* Operation codes for 'aio_lio_opcode': */&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;           enum { LIO_READ, LIO_WRITE, LIO_NOP };&lt;br/&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;br/&gt;&lt;br/&gt;When any of the above function is called it creates a new thread and returns to user process immediately. This new thread keeps waiting for I/O operation to complete while the user process continues its execution. When I/O operation completes it notifies user process by delivering signal specified in aiocb struct.&lt;br/&gt;&lt;br/&gt;The  current Linux POSIX AIO implementation is provided in userspace by &lt;b&gt;glibc&lt;/b&gt;. This has a number of limitations, most notably that maintaining multiple threads to perform I/O  operations is expensive and scales poorly.&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;Sources:&lt;br/&gt;1. Unix Network Programming, Volume 1, 3rd edition, W. Richard Stevens(Chapter 6)&lt;br/&gt;2. aio(7) (aio man page)&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Computer-Programming/How-is-asynchronous-IO-implemented-in-programming-languages/answer/Prakash-Gamit"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;</description><link>http://nellaikanth.tumblr.com/post/47309483977</link><guid>http://nellaikanth.tumblr.com/post/47309483977</guid><pubDate>Sat, 06 Apr 2013 18:00:36 -0400</pubDate></item><item><title>How do you explain basic terminology of Git in the easiest way?</title><description>&lt;p&gt;Answer by Daniel Kinzler:&lt;/p&gt;&lt;blockquote&gt;Lots of pretty diagrams here! Let me add mine to the mix :)&lt;br/&gt;&lt;br/&gt;&lt;div&gt;&lt;img class="qtext_image zoomable_in zoomable_in_feed" src="http://qph.is.quoracdn.net/main-qimg-d639852f4d22174be755bdac04a641d2" master_src="http://qph.is.quoracdn.net/main-qimg-3a15c9acb241e6e1ccc011b2f2350f58" master_w="500" master_h="409"/&gt;&lt;/div&gt;&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Git-revision-control/How-do-you-explain-basic-terminology-of-Git-in-the-easiest-way/answer/Daniel-Kinzler"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/45982584415</link><guid>http://nellaikanth.tumblr.com/post/45982584415</guid><pubDate>Fri, 22 Mar 2013 06:14:08 -0400</pubDate></item><item><title>Where can I learn more about cool (novel?) data structures and algorithms?</title><description>&lt;p&gt;Answer by Aditya Jain:&lt;/p&gt;&lt;blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Cosmin-Negruseri"&gt;Cosmin Negruseri&lt;/a&gt;&lt;/span&gt; has already given a good list that covers most of the books I  will be mentioning .While answering this question , I am assuming that you are familiar with most of the material in Introduction to Algorithms by Cormen et al.&lt;br/&gt;&lt;br/&gt;From here on, depending on your specific needs and interests , you can pick up any of the following books &lt;br/&gt;&lt;br/&gt;&lt;br/&gt;1. Advanced Data Structures&lt;br/&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.amazon.com/Foundations-Multidimensional-Structures-Kaufmann-Computer/dp/0123694469/ref=sr_1_1?s=books&amp;amp;ie=UTF8&amp;amp;qid=1363563366&amp;amp;sr=1-1&amp;amp;keywords=foundations+of+multidimensional+and+metric+data+structures" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "amazon.com")'&gt;Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics): Hanan Samet: 9780123694461: Amazon.com: Books&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;This book probably has every data structure you would ever need.The book is huge so you might just want to read the relevant sections.&lt;br/&gt;&lt;br/&gt;2.Randomized Algorithms&lt;br/&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.amazon.com/Randomized-Algorithms-Rajeev-Motwani/dp/0521474655/ref=sr_1_1?ie=UTF8&amp;amp;qid=1363563041&amp;amp;sr=8-1&amp;amp;keywords=randomized+algorithms" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "amazon.com")'&gt;Randomized Algorithms: Rajeev Motwani, Prabhakar Raghavan: 9780521474658: Amazon.com: Books&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.amazon.com/Probability-Computing-Randomized-Algorithms-Probabilistic/dp/0521835402/ref=sr_1_2?ie=UTF8&amp;amp;qid=1363563041&amp;amp;sr=8-2&amp;amp;keywords=randomized+algorithms" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "amazon.com")'&gt;Probability and Computing: Randomized Algorithms and Probabilistic Analysis: Michael Mitzenmacher, Eli Upfal: 9780521835404: Amazon.com: Books&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;3.Approximation Algorithms&lt;br/&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.amazon.com/Approximation-Algorithms-Vijay-V-Vazirani/dp/3642084699/ref=sr_1_3?ie=UTF8&amp;amp;qid=1363563041&amp;amp;sr=8-3&amp;amp;keywords=randomized+algorithms" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "amazon.com")'&gt;Approximation Algorithms: Vijay V. Vazirani: 9783642084690: Amazon.com: Books&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;4.Computational Geometry&lt;br/&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.amazon.com/Computational-Geometry-Applications-Mark-Berg/dp/3642096816/ref=sr_1_1?s=books&amp;amp;ie=UTF8&amp;amp;qid=1363563173&amp;amp;sr=1-1&amp;amp;keywords=computational+geometry" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "amazon.com")'&gt;Amazon.com: Computational Geometry: Algorithms and Applications (9783642096815): Mark de Berg, Otfried Cheong, Marc van Kreveld, Mark Overmars: Books&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;&lt;br/&gt; 5.String Algorithms&lt;br/&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.amazon.com/Algorithms-Strings-Trees-Sequences-Computational/dp/0521585198/ref=sr_1_1?s=books&amp;amp;ie=UTF8&amp;amp;qid=1363563209&amp;amp;sr=1-1&amp;amp;keywords=algorithms+on+strings" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "amazon.com")'&gt;Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology: Dan Gusfield: 9780521585194: Amazon.com: Books&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;6.Algorithmic Game Theory&lt;br/&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.amazon.com/Algorithmic-Game-Theory-Noam-Nisan/dp/0521872820/ref=sr_1_1?s=books&amp;amp;ie=UTF8&amp;amp;qid=1363563235&amp;amp;sr=1-1&amp;amp;keywords=algorithmic+game+theory" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "amazon.com")'&gt;Algorithmic Game Theory: Noam Nisan, Tim Roughgarden, Eva Tardos, Vijay V. Vazirani: 9780521872829: Amazon.com: Books&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;7. Distributed Algorithms&lt;br/&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.amazon.com/Distributed-Algorithms-Kaufmann-Management-Systems/dp/1558603484/ref=pd_bxgy_b_img_z" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "amazon.com")'&gt;Distributed Algorithms (The Morgan Kaufmann Series in Data Management Systems): Nancy A. Lynch: 9781558603486: Amazon.com: Books&lt;/a&gt;&lt;/span&gt;&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Algorithms/Where-can-I-learn-more-about-cool-novel-data-structures-and-algorithms/answer/Aditya-Jain-2"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/45679931947</link><guid>http://nellaikanth.tumblr.com/post/45679931947</guid><pubDate>Mon, 18 Mar 2013 12:48:32 -0400</pubDate></item><item><title>Why isn't supervised machine learning more automated?</title><description>&lt;p&gt;Answer by Alex Clemmer:&lt;/p&gt;&lt;blockquote&gt;In order to address the stated question (&amp;#8220;Why isn&amp;#8217;t supervised machine learning more automated?&amp;#8221;), we have to begin to understand how complicated supervised learning really is. (We&amp;#8217;ll get to the other questions in the &amp;#8220;details&amp;#8221; section in a minute.)&lt;br/&gt;&lt;br/&gt;Here&amp;#8217;s how complicated it is.&lt;br/&gt;&lt;br/&gt;Say you work at a lab somewhere. Your boss, Professor George O&amp;#8217;Jungle, hands you a box marked &amp;#8220;data&amp;#8221;. He mumbles, &amp;#8220;use machine learning to classify this data&amp;#8221;, and then walks away. Here is a nifty picture:&lt;br/&gt;&lt;br/&gt;&lt;div&gt;&lt;img class="qtext_image zoomable_in zoomable_in_feed" src="http://qph.is.quoracdn.net/main-qimg-bf3068235e5149e1f356fb1e13bfc0b9" master_src="http://qph.is.quoracdn.net/main-qimg-d977f097ef43e36f01ede0db33b19884" master_w="3229" master_h="965"/&gt;&lt;/div&gt;&lt;i&gt;fig 1: George pictured here in skirt due to it being International Subvert Patriarchy Day&lt;/i&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;You&amp;#8217;re pretty busy, and groan out loud:&lt;br/&gt;&lt;br/&gt;Q: &lt;b&gt;Is there some way to automate this learning task?&lt;/b&gt;&lt;br/&gt;A: &lt;b&gt;No, at least not at this point in time.&lt;/b&gt; You don&amp;#8217;t even know what&amp;#8217;s in the box. It could be hamsters or something equally ridiculous. Before we do machine learning, we must know what our observable phenomena is.&lt;br/&gt;&lt;br/&gt;Ok, so you decide to find out. You open the box. Inside are papers with numbers on them.&lt;br/&gt;&lt;div&gt;&lt;img class="qtext_image zoomable_in zoomable_in_feed" src="http://qph.is.quoracdn.net/main-qimg-bae5a8ba38f4eba756353a13b38d94a2" master_src="http://qph.is.quoracdn.net/main-qimg-3167ff3398f5484f7767eb90752dc007" master_w="3565" master_h="861"/&gt;&lt;/div&gt;Great, the empirical phenomena are something sane. If it was, like, bird migration patterns, we&amp;#8217;d have to change real observations about that into useful learnable data. But here we have just a bunch of numbers. Computers are good at looking at doing stuff with numbers, so it looks like we&amp;#8217;re getting something for free.&lt;br/&gt;&lt;br/&gt;So now you ask:&lt;br/&gt;&lt;br/&gt;Q: &lt;b&gt;Now can I automate this learning task?&lt;/b&gt;&lt;br/&gt;A: &lt;b&gt;Still no.&lt;/b&gt; Your data are just numbers on a paper. In order to do machine learning, there must be a meaningful interpretation of our data. Interpreting the data is often called &amp;#8220;cleaning&amp;#8221; the data. Sometimes this step is trivial. For example, if your data are JPEG images, then you should probably just &amp;#8220;interpret&amp;#8221; each binary as a JPEG image. In other cases, it is less straightforward. For example, if your data are emails, then you&amp;#8217;ll want to remove the headers, HTML tags, images, and other vestigial data that could needlessly harm your algorithm. In other words, you are &amp;#8220;interpreting&amp;#8221; the data as a set of emails, where &amp;#8220;email&amp;#8221; is really just &amp;#8220;text in the body&amp;#8221; or something. Not doing this correctly can literally ruin your ability to do machine learning, so don&amp;#8217;t ignore it!&lt;br/&gt;&lt;br/&gt;Anyway, you now set off to find out how all these pages of numbers should be interpreted. You ask Professor George. He explains that they are images. He shows you how to use his expensive image-interpreting machine. You feed all the papers into this image-interpreting machine:&lt;br/&gt;&lt;div&gt;&lt;img class="qtext_image zoomable_in zoomable_in_feed" src="http://qph.is.quoracdn.net/main-qimg-ed2d50be6dee148b2af881240a82ef4f" master_src="http://qph.is.quoracdn.net/main-qimg-36455bafd7ababfd79bd6ad8f0371dda" master_w="3565" master_h="861"/&gt;&lt;/div&gt;[Cow image sources:&lt;span class="qlink_container"&gt;&lt;a href="http://commons.wikimedia.org/wiki/File:Cow_female_black_white.jpg" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "wikimedia.org")'&gt; File:Cow female black white.jpg&lt;/a&gt;&lt;/span&gt;, &lt;span class="qlink_container"&gt;&lt;a href="http://commons.wikimedia.org/wiki/File:Cow-IMG_2050.JPG" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "wikimedia.org")'&gt;File:Cow-IMG 2050.JPG&lt;/a&gt;&lt;/span&gt;]&lt;br/&gt;&lt;br/&gt;Now you have a nice stack of images. So once again, you come back to your original question:&lt;br/&gt;&lt;br/&gt;Q: &lt;b&gt;Ok, &lt;/b&gt;&lt;b&gt;&lt;i&gt;now&lt;/i&gt;&lt;/b&gt;&lt;b&gt; can I automate this learning task?&lt;/b&gt;&lt;br/&gt;A: &lt;b&gt;Sorry, still no.&lt;/b&gt; You may have a useful interpretation of your data, but you have not specified your hypothesis set. A &lt;i&gt;hypothesis&lt;/i&gt; basically maps things to some set of classes (&lt;i&gt;i.e.&lt;/i&gt;, it &amp;#8220;classifies&amp;#8221; things), and a &lt;i&gt;hypothesis set&lt;/i&gt; is the set of possible hypotheses. Examples of hypothesis sets are the SVM and the perceptron. Examples of a hypothesis are weight vectors that &amp;#8220;classify&amp;#8221; your data.  Note that hypothesis sets make assumptions about the data (for example, independence assumptions), so choosing the right model is often a balance between hypotheses that are (1) tractable to learn, and (2) expressive. In order to do machine learning, you need to know which hypothesis class you&amp;#8217;re using, what your learning algorithm is (&lt;i&gt;e.g.&lt;/i&gt;, gradient descent), and what classes you&amp;#8217;re mapping to.&lt;br/&gt;&lt;br/&gt;So once more you consult Professor O&amp;#8217;Jungle. He explains that your task is to separate pictures into piles of those that contain cows and those that do not. You can use whatever hypothesis set and learning algorithm you like. &amp;#8220;Great!&amp;#8221; you think. &amp;#8220;I choose the SVM, with whatever learning algorithm is fastest, and the rest will be easy.&amp;#8221; You sit back and grin.&lt;br/&gt;&lt;br/&gt;Q: &lt;b&gt;&amp;#8230; Because now I can completely automate this learning task, right?&lt;/b&gt;&lt;br/&gt;A: &lt;b&gt;Wrong.&lt;/b&gt; You may have cleaned data, and you may have have a task, but you did not specify which &lt;i&gt;features&lt;/i&gt; of the images are relevant to learning this task. The reason is that your hypothesis set is making &lt;i&gt;assumptions&lt;/i&gt; about your data. In particular, most hypothesis sets designed for classification assume that your data is a &lt;img src="http://qlx.is.quoracdn.net/main-8cc4d1a08841dd25.png" width="11" height="14" class="math" title="d" alt="d"/&gt;-dimensional real vector, which can be interpreted as points in &lt;img src="http://qlx.is.quoracdn.net/main-8cc4d1a08841dd25.png" width="11" height="14" class="math" title="d" alt="d"/&gt;-dimensional space that need to be &amp;#8220;separated.&amp;#8221; Why and how is a technical question for another time. The question for now is: how do we take images and turn them into vectors?&lt;br/&gt;&lt;br/&gt;You head back yet again to prof. O&amp;#8217;Jungle. Your consolation seems to be that we must be most of the way to automating the learning by now. Or you grimly hope so anyway. You ask what features he wants to use. He slaps his forehead and exclaims that he is an idiot for forgetting to show you his feature-extract-o-tron. It&amp;#8217;s not really automatic, because someone had to make it especially for this problem. And in general, it is not clear how to automatically select these features. One good attempt is convolutional deep belief nets, but they&amp;#8217;re not quite &amp;#8220;there&amp;#8221; yet. And anyway, while this solution isn&amp;#8217;t really automatic, it is automatic &lt;i&gt;for you&lt;/i&gt;. And that&amp;#8217;s good enough.&lt;br/&gt;&lt;br/&gt;You slump back at your desk. &lt;i&gt;This had better work&lt;/i&gt;. You feed all the images in and get back some nice vectors of real numbers:&lt;br/&gt;&lt;br/&gt;&lt;div&gt;&lt;img class="qtext_image zoomable_in zoomable_in_feed" src="http://qph.is.quoracdn.net/main-qimg-f441966da6c4a8dd8e35f7840256b723" master_src="http://qph.is.quoracdn.net/main-qimg-9a7183f2f66d166848157919ef394c7d" master_w="3565" master_h="861"/&gt;&lt;/div&gt;&lt;br/&gt;Finally, you think:&lt;br/&gt;&lt;br/&gt;Q: &lt;b&gt;Ok, surely now I can automate this learning task, right?&lt;/b&gt;&lt;br/&gt;A: &lt;b&gt;Sorry, but no.&lt;/b&gt; Now you have to do the parameter tuning.&lt;br/&gt;&lt;br/&gt;The good news is that this is pretty much automatic. First you randomly split your data into &amp;#8220;training&amp;#8221; and &amp;#8220;testing&amp;#8221; sets. Then pick a set of values that seems reasonable for your parameters. Then you train your model on the &amp;#8220;training&amp;#8221; data. Then you test the model on the &amp;#8220;test&amp;#8221; data. Finally, you select whichever model had the best results.&lt;br/&gt;&lt;br/&gt;There probably isn&amp;#8217;t a way to completely automate this step away in general, but it&amp;#8217;s already pretty close.&lt;br/&gt;&lt;br/&gt;So you end up with a well-tuned model a nice result. You look back at the process. None of the steps were &lt;i&gt;really&lt;/i&gt; automatic, and in fact, it seems like it would be pretty hard to automate any of them.&lt;br/&gt;&lt;br/&gt;You grumble to your colleagues about this:&lt;br/&gt;&lt;br/&gt;&lt;h2&gt;Why do machine learning at all if it can&amp;#8217;t be completely automatic?&lt;/h2&gt;And the answer is simple: because it actually saved you a lot of time already. Imagine if you had tried to program your cow detector by hand, without using ML. Sure, it&amp;#8217;s not completely automatic, but no free lunch, right? (Actually it is provable that there is no free lunch, but that&amp;#8217;s for a different post.)&lt;br/&gt;&lt;br/&gt;You might complain that this is how you know machines aren&amp;#8217;t &amp;#8220;really intelligent&amp;#8221;, but the bottom line is that if we didn&amp;#8217;t have ML, we would not be able to &lt;i&gt;quickly&lt;/i&gt; build things like search engines or cow detectors. That is, we could build them, but we&amp;#8217;d build them much slower.&lt;br/&gt;&lt;br/&gt;So: we use ML because it helps us to quickly deal with large volumes of arbitrary data. It isn&amp;#8217;t magic, but it works well enough.&lt;br/&gt;&lt;br/&gt;&lt;h2&gt;Misc: the &amp;#8220;details&amp;#8221; section of the question:&lt;/h2&gt;The  question details ask why we can&amp;#8217;t completely automate (1) parameter  tuning and (2) feature selection. In the case of (1), the short answer  is that parameter tuning is already the most automatic thing about  supervised ML &amp;#8212; you get a working classifier, then optimize it by  running it over some range of parameters, and then select the &amp;#8220;best&amp;#8221;  model. Easy. (2) is much harder, and the short answer is that we&amp;#8217;re  still working on automatic feature selection. Convolutional deep belief  nets have great promise, but it will be many years before this is  viable, if it ever turns out to be.&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;hr class="qtext_hr"&gt;&lt;br/&gt;&lt;br/&gt;&lt;i&gt;If you thought this was useful and you enjoy ML, you might also enjoy &lt;/i&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://antics.quora.com/"&gt;&lt;i&gt;my Quora blog&lt;/i&gt;&lt;/a&gt;&lt;/span&gt;&lt;i&gt;, which is primarily concerned with ML-ish problems.&lt;/i&gt;&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Machine-Learning/Why-isnt-supervised-machine-learning-more-automated/answer/Alex-Clemmer"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/45339500417</link><guid>http://nellaikanth.tumblr.com/post/45339500417</guid><pubDate>Thu, 14 Mar 2013 08:03:37 -0400</pubDate></item><item><title>How do Go, Scala and Julia compare with each other?</title><description>&lt;p&gt;Answer by Eric Talevich:&lt;/p&gt;&lt;blockquote&gt;&lt;b&gt;Go&lt;/b&gt; is a systems programming language. Google created it to safely solve three specific problems that C++ was biting them with: concurrency, memory management and compilation time on large systems. It&amp;#8217;s concise and readable, like Scala and Julia, but a bit fussy and low-level, unlike Julia. You would use it to write safe, well-performing services that your main application might call to. It would certainly be possible to write scientific software in Go, much as you would with C or C++, but it doesn&amp;#8217;t have the same flexibility and interactivity as, say, Python or Julia, or even Scala. If you&amp;#8217;re trying to use Go for data analysis, the fussiness and lack of interactivity will disappoint you.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Julia&lt;/b&gt; is a scientific applications programming language, designed to be a good replacement for Matlab and Python+SciPy. It&amp;#8217;s very new, so I won&amp;#8217;t attempt to predict which specific features it will or won&amp;#8217;t pick up, but the underlying motivations should drive it to be a very good programming language for scientific software &amp;#8212; not syntactically optimized for statistical operations on data arrays like R, but good to write a computationally intensive program that uses multiple CPUs and generates pretty graphics.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Scala&lt;/b&gt; is a general-purpose language that sits between the other two, capable of dealing with either end of the spectrum. Twitter uses it for services. Scientists use it for aligning protein structures. It&amp;#8217;s a good language, a few years older than the other two, and is built on the JVM, which has its own pros and cons. If you want your code to be something other programmers can use and/or build on, Scala is your best choice at the moment &amp;#8212; it creates .jar files which can be used directly from plain Java, whereas Go doesn&amp;#8217;t seem to be capable of creating reusable shared object (.so) files yet.&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Programming-Languages/How-do-Go-Scala-and-Julia-compare-with-each-other/answer/Eric-Talevich"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/44679391178</link><guid>http://nellaikanth.tumblr.com/post/44679391178</guid><pubDate>Tue, 05 Mar 2013 22:11:51 -0500</pubDate></item><item><title>Kafka writes every message to broker disk. Still, performance wise it is better than some of the in-memory message storing message queues...</title><description>&lt;p&gt;Answer by Jay Kreps:&lt;/p&gt;&lt;blockquote&gt;There are a few reasons:&lt;br/&gt;&lt;br/&gt;The first is that Kafka does only sequential file I/O. To enable this kafka enforces end-to-end ordering of messages in delivery. This means the consumer has a single position in that message stream, and this position can be stored lazily. Typically messaging system keep some kind of per-message state about what has been consumed and have to update it. This introduces all kinds of random updates to mark messages consumed. By contrast Kafka keeps a single pointer into each partition of a topic, rather than a per-message state. All messages prior to the pointer are considered consumed, and all messages after it are consider unconsumed. This eliminates most of the random I/O in acknowledging messages, since by moving the pointer forward many messages at a time we can implicitly acknowledge them all. As a side benefit retaining order is good for other reasons (often the ordering has meaning). The reason most messaging systems don&amp;#8217;t do this is because it is hard&amp;#8212;it requires co-ordination among the consumers to &amp;#8220;elect&amp;#8221; consumers for each partition. We lean on zookeeper to manage this process of matching consumes to partitions of data on servers and keeping this matching up to data as the set of available consumers and brokers changes.&lt;br/&gt;&lt;br/&gt;The second reason is because Kafka supports end-to-end batching of messages. Computers love linear scans and transfers with big arrays, they hate little bursty random messages. One prerogative of an asynchronous messaging system is the ability to introduces just a little delay to allow what would have been small bursty messages to turn into big fat ones. This speeds up network transfers, disk operations, and even in-memory iteration. We expose this as tunable parameters, so people who can stand a little extra latency can get a lot of extra throughput.&lt;br/&gt;&lt;br/&gt;Finally Kafka leans heavily on the OS pagecache for data storage. Although the question says that kafka writes to disk immediately, that is not completely true. Actually Kafka just writes to the filesystem immediately, which is really just writing to the kernel&amp;#8217;s memory pool which is asynchronously flushed to disk. There are a couple of reasons this is a good idea:&lt;br/&gt;&lt;ul&gt;&lt;li&gt;Kafka runs on the JVM and keeping data in the heap of a garbage collected language isn&amp;#8217;t wise. There are a couple of reasons for this. One is the GC overhead of continually scanning your in-memory cache, the other is the object overhead (in java a hash table of small objects tends to be mostly overhead not data).&lt;/li&gt;&lt;li&gt;Modern operating systems reserve all free memory as &amp;#8220;pagecache&amp;#8221;. Basically contiguous chunks of memory that soaks up reads and writes to disk. The nice thing about this is that on a 32GB machine you get access to virtually all of that memory automatically without having to worry about the possibility of running out of memory and swapping.&lt;/li&gt;&lt;li&gt;Unix has optimizations to allow you to directly write data in pagecache to a socket without any additional copying (aka sendfile). Any data sent on a socket has to cross the process/kernel memory boundary any way. This means if you keep data in your process, and need to deliver that data to multiple consumers you need to recopy it into kernel space, buffering on both sides, each time. This approach gets rid of all the buffering and copying and uses and single structure.&lt;/li&gt;&lt;/ul&gt;If you are interested in this stuff there are a couple of more detailed write-ups. There is a more complete design document that discusses the trade-offs in more detail [1]. There is also a more recent write-up on the use of kafka at LinkedIn which gives some performance and operational statistics [2].&lt;br/&gt;&lt;br/&gt;[1] &lt;a href="http://incubator.apache.o"&gt;http://incubator.apache.o&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;rg/kafka/design.html&lt;br/&gt;[2] &lt;a href="http://sites.computer.org"&gt;http://sites.computer.org&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;/debull/A12june/A12JUN-CD&lt;wbr&gt;&lt;/wbr&gt;.pdf&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Apache-Kafka/Kafka-writes-every-message-to-broker-disk-Still-performance-wise-it-is-better-than-some-of-the-in-memory-message-storing-message-queues-Why-is-that/answer/Jay-Kreps"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/44063360590</link><guid>http://nellaikanth.tumblr.com/post/44063360590</guid><pubDate>Tue, 26 Feb 2013 09:57:31 -0500</pubDate></item><item><title>Which vs. That</title><description>&lt;p&gt;Post by Gayle Laakmann McDowell:&lt;/p&gt;&lt;blockquote&gt;Which vs. That&lt;/blockquote&gt;&lt;a href="http://englishtips.quora.com/Which-vs-That?srid=huKI&amp;amp;share=1"&gt;View Post on Quora&lt;/a&gt;</description><link>http://nellaikanth.tumblr.com/post/43934763293</link><guid>http://nellaikanth.tumblr.com/post/43934763293</guid><pubDate>Sun, 24 Feb 2013 18:32:33 -0500</pubDate></item><item><title>The difference between e.g. and i.e.</title><description>&lt;p&gt;Post by Gayle Laakmann McDowell:&lt;/p&gt;&lt;blockquote&gt;The difference between e.g. and i.e.&lt;/blockquote&gt;&lt;a href="http://englishtips.quora.com/The-difference-between-e-g-and-i-e?srid=huKI&amp;amp;share=1"&gt;View Post on Quora&lt;/a&gt;</description><link>http://nellaikanth.tumblr.com/post/43934736616</link><guid>http://nellaikanth.tumblr.com/post/43934736616</guid><pubDate>Sun, 24 Feb 2013 18:32:14 -0500</pubDate></item><item><title>What are some common Machine Learning interview questions?</title><description>&lt;p&gt;Answer by Charles H Martin:&lt;/p&gt;&lt;blockquote&gt;When I am asked to interview people, I try to ascertain whether they know the math or not, and how to apply it  in a real world context.  I also look to see if they understand high performance computing and not just vanilla coding.  &lt;br/&gt;&lt;br/&gt;I was asked to do this as a consultant, acting as a subject matter expert to help interview junior people for the firm.  &lt;br/&gt;&lt;br/&gt;In our interviews, we asked a candidate to present some code they had written and to talk through it.   For an ML person, it would be some kind of ML code.&lt;br/&gt;&lt;br/&gt;So, for example, I was involved with an interview with a Physics PhD from MIT discussing some NMF code he wrote in javascript.   The javascript was very good and he would be fine doing GUI work , Node.JS work, etc.  Certainly not something I could do.&lt;br/&gt;&lt;br/&gt;Can he do Machine Learning.  Mind you, he has a PhD in a math heavy subject from one of the top 10 schools in the world.  So he should know the math.&lt;br/&gt;&lt;br/&gt;I wanted to see if he knew how to get it to converge properly.  He did not.     He knew it was non-convex, but he did not know how to seed it, nor did he know about the convex variants. He tried to give me some nonsense about it being Bi-convex and whatnot. Dude, just use Kmeans++  to seed it.  Thats it.  Thats all you had to say.   This got totally past the VP of engineering and the CTO.    (They were just impressed that machine learning involved computing a first order derivative&amp;#8212;something neither had since since college calculus)&lt;br/&gt;&lt;br/&gt;So here, he knew some basic methods, but did not really know the most important ideas in the field, the important developments,  how to really code this.  It is clear that he had never done anything like this in his former work, nor did he really understand numerical methods.&lt;br/&gt;&lt;br/&gt;This means that his solution would never work in production and &amp;#8212; more importantly &amp;#8212; that he would have no idea how to evaluate it or how to  fix it.  I see this a lot.   Also, he did not know the available open source codes, how they worked internally, and  which one to use,  or how to evaluate their performance.  For being a PhD from MIT, this was unacceptable to me.&lt;br/&gt;&lt;br/&gt;There was also a code evaluation.  For me, one needs to know what runs fast and what does not. What good is a method that only runs on 300 data points?!     In this case, this interviewee had written his own javascript matrix library.  Did he know the BLAS libraries and how they work?  Or an alternative?     This is critical because you can&amp;#8217;t run anything in production if the code is too slow.   I see the same problem with most ruby coders&amp;#8212;they just don&amp;#8217;t know numerical computing.    &lt;br/&gt;&lt;br/&gt;I was not looking to evaluate 10,000 of complex code , whether he used Agile or Unit Testing.  Nor did I care about solving some high school brain teaser. I just wanted to see a small piece of code, with good engineering choices ,   a good understanding of the math, and how to make this solution work in a modest production environment . &lt;br/&gt;&lt;br/&gt;Id rather see old fashioned spectral clustering with a  Fortran library, which can scale, as opposed to trying to use a &amp;#8220;fancy&amp;#8221; method like NMF or LDA if you can not get it to work in production at scale.  (I&amp;#8217;m not saying they don&amp;#8217;t scale&amp;#8212;I am saying you better know how to get them to scale if you choose to use them)&lt;br/&gt;&lt;br/&gt;In another interview, again a PhD (Ukrainian I think) who was very bright and had solved some good problems and had experience.  He was using an off-the-shelf SVM tool&amp;#8212;a tool I know very well.   I asked a very basic question&amp;#8212;how do you adjust the cost parameter for the SVM regularizer.   I rephrased the question a couple of different ways to give him a chance.    FAIL   In other words, did you read the documentation of the tool and did you understand which parameters to tweak and which ones to leave at the default settings  ( I kinda would like the person to have read the entire source code of the tool and know how it works. )   Again, this demonstrates a failure of the most basic mathematical concepts in ML &amp;#8212; Regularization&amp;#8212; and how they would apply in production.  Tuning this parameter can increase accuracy by %10-15 (or more).  Again, just simple stuff&amp;#8212;but important stuff  This also shows a lack of attention to notice the important details of the work.  We actually offered this guy the job and he asked for a salary way out of the ballpark.  If he had not missed this critical question he might have been able to make the case for the salary.&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;Having shared all this, I would add that I think , for you, the market is very good and you will probably not encounter anything like this.  Why?  All you need to do is know more machine learning than the VP and the CTO&amp;#8212;and here the bar is very low.   Everyone and his brother has a funding to do machine learning and they usually just need to solve one small problem and get the product out the door.  Most  (i.e 7/10 ) CTOs and VPs know absolutely nothing about even basic   machine learning  so they have no clue  even what to  ask. (Newton Raphson will blow them away, and they will think you are too expensive if you try compare stochastic gradient descent to interior point methods)  They got their start up funded based on the market potential of  the idea, and they are expected to hire people to invent their IP. &lt;br/&gt;&lt;br/&gt;  (Obviously if you are interviewing at Google or Lockheed Martin, disregard all of this and hire me once you get in)&lt;br/&gt;&lt;br/&gt;P.S.   I was asked once by some VP/CTO evaluating me what the volume of a rectangular prism is.  AlI could think of was this old Pink Floyd album Dark Side of the Moon  with the Prism on it &lt;br/&gt;&lt;br/&gt;&lt;a href="http://en.wikipedia.org/w"&gt;http://en.wikipedia.org/w&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;iki/The_Dark_Side_of_the_&lt;wbr&gt;&lt;/wbr&gt;Moon&lt;br/&gt;&lt;br/&gt;I would never ask this kind of question but you will probably get asked many   puzzle questions like this if you are fresh out of school (or an old man like me I guess)  I seem to recall there are books and/or web sites with tons of these.&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;  Good Luck&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/What-are-some-common-Machine-Learning-interview-questions/answer/Charles-H-Martin"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/43809303116</link><guid>http://nellaikanth.tumblr.com/post/43809303116</guid><pubDate>Sat, 23 Feb 2013 11:16:08 -0500</pubDate></item><item><title>What is the purpose of each folder in the root of an Ubuntu installation?</title><description>&lt;p&gt;Answer by Juan Manavella:&lt;/p&gt;&lt;blockquote&gt;This is not Ubuntu specific. All Linux distributions shares the same directory tree among them:&lt;br/&gt;&lt;br/&gt;&lt;div&gt;&lt;img class="qtext_image zoomable_in zoomable_in_feed" src="http://qph.is.quoracdn.net/main-qimg-7f28b800f4783f483bfe7940d1f247ec" master_src="http://qph.is.quoracdn.net/main-qimg-549ecbd6e13d48e2a990845dac9168a8" master_w="584" master_h="474"/&gt;&lt;/div&gt;&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Ubuntu/What-is-the-purpose-of-each-folder-in-the-root-of-an-Ubuntu-installation/answer/Juan-Manavella"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/43604340072</link><guid>http://nellaikanth.tumblr.com/post/43604340072</guid><pubDate>Wed, 20 Feb 2013 19:41:12 -0500</pubDate></item><item><title>I want to learn memoization. Can you give me some links with problems from spoj/topcoder/codeforces?</title><description>&lt;p&gt;Answer by Laurentiu Cristian Ion:&lt;/p&gt;&lt;blockquote&gt;Your intuition is good. &lt;br/&gt;&lt;br/&gt;&lt;b&gt;Memoization &lt;/b&gt;is much easier than the standard &lt;b&gt;Dynamic Programming&lt;/b&gt; with &lt;i&gt;iteration.&lt;/i&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;The hardest part in solving a &lt;b&gt;DP &lt;/b&gt;problem (with or without &lt;i&gt;memoization&lt;/i&gt;) is to figure out the &lt;i&gt;states &lt;/i&gt;(overlapping subproblems) and the &lt;i&gt;recurrent formula &lt;/i&gt;for a particular problem. Once you figured this out it is just a matter of writing a few lines of code.&lt;br/&gt;&lt;br/&gt;A great tutorial on &lt;b&gt;DP&lt;/b&gt;: &lt;span class="qlink_container"&gt;&lt;a href="http://community.topcoder.com/tc?module=Static&amp;amp;d1=tutorials&amp;amp;d2=dynProg" rel="nofollow" class="external_link" target="_blank"&gt;&lt;a href="http://community.topcoder"&gt;http://community.topcoder&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;.com/tc&amp;#8230;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;Now there are two ways to solve DP problems: &lt;br/&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;bottom-up &lt;/b&gt;or &lt;b&gt;iteration, &lt;/b&gt;which is when you first compute the smaller states and solve larger and larger states based on the previously computed subproblems until you solve the entire problem&lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;top-down &lt;/b&gt;or &lt;b&gt;recursion, &lt;/b&gt;which is where &lt;b&gt;&lt;i&gt;memoization &lt;/i&gt;&lt;/b&gt;comes in. You call a recursive function which solves the smaller states and uses them to solve the whole problem, and in order to be efficient and not compute already computed states, it keeps every answer in a &lt;b&gt;table, array &lt;/b&gt;or &lt;b&gt;map&lt;/b&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;At first, depending on your desire, you might like the memoization approach better because it might be easier to understand for you, but after you get the hang of it the &lt;i&gt;bottom-up&lt;/i&gt; approach is shorter and even faster sometimes (and you don&amp;#8217;t have to worry about the size of the stack &amp;#8212; which may be limited on some online judges).&lt;br/&gt;&lt;br/&gt;How do you get good at figuring out &lt;i&gt;Dynamic Programming&lt;/i&gt; solutions? &lt;br/&gt;&lt;b&gt;Practice. A lot. &lt;/b&gt;Start with Div 2 problems.&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;Topcoder&lt;/b&gt; has, in my opinion, the most extensive collection of DP problems, and they are easy to sort by category, tags, and difficulty: &lt;span class="qlink_container"&gt;&lt;a href="http://community.topcoder.com/tc?module=ProblemArchive&amp;amp;sr=&amp;amp;er=&amp;amp;sc=&amp;amp;sd=&amp;amp;class=&amp;amp;cat=Dynamic+Programming&amp;amp;div1l=&amp;amp;div2l=&amp;amp;mind1s=&amp;amp;mind2s=&amp;amp;maxd1s=&amp;amp;maxd2s=&amp;amp;wr=" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "topcoder.com")'&gt;TopCoder Statistics - Problem Archive&lt;/a&gt;&lt;/span&gt; (topcoder DP problems archive). &lt;br/&gt;&lt;br/&gt;Moreover, the topcoder practice rooms help you become a faster and better coder, without the need of much debugging.&lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;&lt;br/&gt;&lt;b&gt;&lt;u&gt;You can solve these using either memoization or the bottom-up approach.&lt;/u&gt;&lt;/b&gt;&lt;b&gt; &lt;/b&gt;See what&amp;#8217;s best for you.&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;Codeforces &lt;/b&gt;also has an easy way to get just the type of problems you want. You just go to the &lt;span class="qlink_container"&gt;&lt;a href="http://codeforces.com/problemset/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "codeforces.com")'&gt;Problemset - Codeforces&lt;/a&gt;&lt;/span&gt; and click on the little blue arrow on the top right and write &lt;i&gt;dp&lt;/i&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;There are also a lot of interesting problems on &lt;b&gt;SPOJ, &lt;/b&gt;but since they don&amp;#8217;t do such a good job at tagging problems, there is a list here: &lt;span class="qlink_container"&gt;&lt;a href="http://apps.topcoder.com/forums/?module=Thread&amp;amp;threadID=674592&amp;amp;start=0&amp;amp;mc=7#1237445" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "topcoder.com")'&gt;TopCoder Forums&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;Nice collection of DP problems from many many sites like spoj, uva, timus, codeforces etc: &lt;span class="qlink_container"&gt;&lt;a href="http://ahmed-aly.com/Category.jsp?ID=33" rel="nofollow" class="external_link" target="_blank"&gt;&lt;a href="http://ahmed-aly.com/Cate"&gt;http://ahmed-aly.com/Cate&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;gory.js&amp;#8230;&lt;/a&gt;&lt;/span&gt; &lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;&lt;b&gt;&lt;u&gt;DP tutorials:&lt;/u&gt;&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.codechef.com/wiki/tutorial-dynamic-programming" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "codechef.com")'&gt;DP tutorial on Codechef.com&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.geeksforgeeks.org/tag/dynamic-programming" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "geeksforgeeks.org")'&gt;Dynamic Programming Archives - GeeksforGeeks&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://apps.topcoder.com/forums/?module=Thread&amp;amp;threadID=700080&amp;amp;start=0" rel="nofollow" class="external_link" target="_blank"&gt;&lt;a href="http://apps.topcoder.com/"&gt;http://apps.topcoder.com/&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;forums/&amp;#8230;&lt;/a&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://apps.topcoder.com/forums/?module=Thread&amp;amp;threadID=697369&amp;amp;start=0" rel="nofollow" class="external_link" target="_blank"&gt;&lt;a href="http://apps.topcoder.com/"&gt;http://apps.topcoder.com/&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;forums/&amp;#8230;&lt;/a&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://codeforces.com/blog/entry/325" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "codeforces.com")'&gt;Dynamic Programming Tutorial on Codeforces&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://codeforces.com/blog/entry/4915" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "codeforces.com")'&gt;TJU 2720 - Incredible! Impossible!&lt;/a&gt;&lt;/span&gt; - nice DP solution using memoization&lt;/li&gt;&lt;li&gt;A very smart guy teaching &lt;i&gt;Introduction to Algorithms &lt;/i&gt;at MIT, see the playlist for DP:&lt;br/&gt;&lt;a href="http://www.youtube.com/watch?v=IFrvgSvZA0I&amp;amp;list=SPUl4u3cNGP61Oq3tWYp6V_F-5jb5L2iHb"&gt;http://www.youtube.com/watch?v=IFrvgSvZA0I&amp;amp;list=SPUl4u3cNGP61Oq3tWYp6V_F-5jb5L2iHb&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://people.csail.mit.edu/bdean/6.046/dp/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "mit.edu")'&gt;Dynamic Programming Practice Problems&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Dynamic-Programming/Are-there-any-good-resources-or-tutorials-for-Dynamic-Programming-besides-TopCoder-tutorial"&gt;Dynamic Programming: Are there any good resources or tutorials for Dynamic Programming besides TopCoder tutorial?&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/TopCoder/What-are-systematic-ways-to-prepare-for-dynamic-programming"&gt;TopCoder: What are systematic ways to prepare for dynamic programming?&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;&lt;br/&gt;If you want to extend your algorithmic knowledge:&lt;br/&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;a list of algorithms used in programming contests - &lt;span class="qlink_container"&gt;&lt;a href="http://translate.google.com/translate?hl=en&amp;amp;sl=ro&amp;amp;tl=en&amp;amp;u=http%3A%2F%2Finfoarena.ro%2Ftraining-path" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "google.com")'&gt;Google Translate&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;if you&amp;#8217;re preparing for a tech interview at a top tier company the best resource is &lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Gayle-Laakmann-McDowell"&gt;Gayle Laakmann McDowell&lt;/a&gt;&lt;/span&gt;&amp;#8217;s book: &lt;span class="qlink_container"&gt;&lt;a href="http://www.amazon.com/dp/098478280X" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "amazon.com")'&gt;Cracking the Coding Interview: 150 Programming Questions and Solutions: Gayle Laakmann McDowell: 9780984782802: Amazon.com: Books&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://ace.delos.com/usacogate" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "delos.com")'&gt;USACO Training Program Gateway&lt;/a&gt;&lt;/span&gt; - again with a great DP chapter&lt;/li&gt;&lt;/ul&gt;&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Algorithms/I-want-to-learn-memoization-Can-you-give-me-some-links-with-problems-from-spoj-topcoder-codeforces/answer/Laurentiu-Cristian-Ion"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/43376741017</link><guid>http://nellaikanth.tumblr.com/post/43376741017</guid><pubDate>Sun, 17 Feb 2013 23:57:46 -0500</pubDate></item><item><title>What are the Top 10 Problems in Machine Learning for 2013?</title><description>&lt;p&gt;Answer by Abhishek Shivkumar:&lt;/p&gt;&lt;blockquote&gt;1. &lt;b&gt;Churn Prediction&lt;/b&gt;:  Churn prediction is one of the most popular use cases for people who want to leverage machine learning. It has a large business value and benefit attached to itself specially in industries like the telecom and banking. Several challenges such as the skewed nature of the data set available and the ability to decide which models to use are going to be under a lot of debate.&lt;br/&gt;&lt;br/&gt;2. &lt;b&gt;Sentiment Analysis&lt;/b&gt;&amp;#160;: A lot of decisions these days are being taken on the opinion of others. We buy a product more because it has received a positive opinion and we visit a hotel most likely because it got the best rating online. Sentiment analysis has its own challenges such as how granular can the sentiment be determined, how subjective is the sentiment and so on, and hence sentiment analysis will be a good place to attack machine learning.&lt;br/&gt;&lt;br/&gt;3. &lt;b&gt;Truth and Veracity&lt;/b&gt;: There is a lot said online these days and it is hard to determine what is true and what is fake. We have bots smart enough to publish content like human beings and there are social aspects attached to the ratings of various entities online. I feel machine learning will be leveraged as a big challenge to determine the veracity/truth of information online.&lt;br/&gt;&lt;br/&gt;4. &lt;b&gt;Recommendations&lt;/b&gt;&amp;#160;: There is such a myriad of choices available online that it is becoming even more difficult to choose a book, restaurant or even a simple product. The ability to do smart recommendations based on the user&amp;#8217;s context and not just based on the preferences of the crowd is going to be a great challenge because it is a great deal to understand the user&amp;#8217;s context.&lt;br/&gt;&lt;br/&gt;5. &lt;b&gt;Online Advertisement&lt;/b&gt;: There is a lot of work and many start ups around the space of intelligent online advertisements, but to be able to push the right advertisement at the right time in the right way to the user needs a lot of understanding of the when to target a particular customer. Machine learning exhibits a great challenge in this space in my opinion for determining the user&amp;#8217;s behavior online to push the correct advertisement instantly when the user really needs it.&lt;br/&gt;&lt;br/&gt;6. &lt;b&gt;News Aggregation&lt;/b&gt;: Plenty of news is being generated around us from various different places about a variety of topics. Yet we all have a constant thirst to consume all the news relevant to us as much as possible. How are we going to aggregate news according to the user&amp;#8217;s preference? Does his taste vary with time? How do we learn this variation? All this is going to be a challenge for machine learning and it involves a great deal of making sense of news and articles.&lt;br/&gt;&lt;br/&gt;7. &lt;b&gt;Scalability&lt;/b&gt;: Data is constantly expanding in variety, velocity and volumes. Can the traditional machine learning algorithms that were developed a decade back be applied to big data? I feel they will all undergo some kind of refurbishment to be able to operate on data at large scale. Can SVMs train faster? Can it be made parallel? This is going to be a good problem to focus with the rise of big data.&lt;br/&gt;&lt;br/&gt;8. &lt;b&gt;Content Discovery/Search&lt;/b&gt;: There are millions of people around the world on various social networks and within enterprise. How can you discover people who share similar interests as yours and what parameters are you going to consider to measure this similarity? How do we measure similarity and can we quantify this? I feel this is a nice problem for machine learning where we will face the challenge of trying to find the needle in a haystack.&lt;br/&gt;&lt;br/&gt;9. &lt;b&gt;Intelligent Learning: &lt;/b&gt; For example, it is still difficult to identify a behavior in a video sequence and there has been a lot of research around this space. In my opinion, with the state of art learning algorithms, I feel one of the top problems is to be enable machines to be able to see, hear and recognize like the human brain does. This means a good problem would be to leverage machine learning algorithms to use different modes of learning to achieve a particular task, be it recognition or anything similar. &lt;br/&gt;&lt;br/&gt;10. &lt;b&gt;Machine Learning for Medicine&lt;/b&gt;: I feel this is probably the most interesting machine learning problem for 2013 and coming future. There are so many diseases that need our attention and a lot of human time spent in researching for their cure by analyzing a lot of symptoms. Yet, two patients with similar health problems receive different kinds of treatment and eventually different extents of cure. Can we use machine learning to understand how a patient is feeling at a particular moment and appropriate recommend the right treatment for him? I feel this will change how we are going to live and will help doctors discover a lot of new medical methodologies.&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Machine-Learning/What-are-the-Top-10-Problems-in-Machine-Learning-for-2013/answer/Abhishek-Shivkumar"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/43336467421</link><guid>http://nellaikanth.tumblr.com/post/43336467421</guid><pubDate>Sun, 17 Feb 2013 15:23:53 -0500</pubDate></item><item><title>What differentiates a good modeler from a bad one on Kaggle?</title><description>&lt;p&gt;Answer by Jeong-Yoon Lee:&lt;/p&gt;&lt;blockquote&gt;I&amp;#8217;d say that a good modeler must be good at (1) feature extraction + (2) algorithm selection/ensemble + (3) algorithm optimization. Newbies often focus on the second part a lot, but experts will spend a lot more on the first part, and the third part differentiates winners from other good players.&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Kaggle/What-differentiates-a-good-modeler-from-a-bad-one-on-Kaggle/answer/Jeong-Yoon-Lee"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/43231352954</link><guid>http://nellaikanth.tumblr.com/post/43231352954</guid><pubDate>Sat, 16 Feb 2013 11:06:16 -0500</pubDate></item><item><title>What are some other sites that are similar to Topcoder(Algorithm section) and Codeforces?</title><description>&lt;p&gt;Answer by Jayesh Chaudhary:&lt;/p&gt;&lt;blockquote&gt;In no particular order:&lt;br/&gt;Topcoder&lt;span class="qlink_container"&gt;&lt;a href="http://topcoder.com/tc" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "topcoder.com")'&gt; - Programming Contests, Software Development, and Employment Services at TopCoder&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;Codeforeces - &lt;span class="qlink_container"&gt;&lt;a href="http://codeforces.com/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "codeforces.com")'&gt;Codeforces&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;1. Codechef - &lt;span class="qlink_container"&gt;&lt;a href="http://codechef.com/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "codechef.com")'&gt;Programming Competition,Programming Contest,Online Computer Programming&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;2. SPOJ - &lt;span class="qlink_container"&gt;&lt;a href="http://spoj.pl/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "spoj.pl")'&gt;Sphere Online Judge (SPOJ)&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;3. UVa - &lt;span class="qlink_container"&gt;&lt;a href="http://uva.onlinejudge.org/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "onlinejudge.org")'&gt;UVa Online Judge - Home&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;4. ProjectEuler - &lt;span class="qlink_container"&gt;&lt;a href="http://projecteuler.net/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "projecteuler.net")'&gt;Project Euler&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;5. Programming Challenges -  &lt;span class="qlink_container"&gt;&lt;a href="http://programmingchallenges.com/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "programmingchallenges.com")'&gt;Programming Challenges&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;6. ahmed-aly -  &lt;span class="qlink_container"&gt;&lt;a href="http://ahmed-aly.com/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "ahmed-aly.com")'&gt;Virtual Online Contests&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;7. TJU -  &lt;span class="qlink_container"&gt;&lt;a href="http://acm.tju.edu.cn/toj/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "tju.edu.cn")'&gt;TJU ACM-ICPC Online Judge&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;8. PJU - &lt;span class="qlink_container"&gt;&lt;a href="http://pju.org/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "pju.org")'&gt;UNION PANAMERICANA DE JUDO&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;9. USACO -  &lt;span class="qlink_container"&gt;&lt;a href="http://ace.delos.com/usacogate" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "delos.com")'&gt;USACO Training Program Gateway&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;10. TIMUS - &lt;span class="qlink_container"&gt;&lt;a href="http://acm.timus.ru/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "timus.ru")'&gt;Timus Online Judge&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;11. AIZU - &lt;span class="qlink_container"&gt;&lt;a href="http://judge.u-aizu.ac.jp/onlinejudge/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "u-aizu.ac.jp")'&gt;Programming Challenge&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;12. URI - &lt;span class="qlink_container"&gt;&lt;a href="http://www.urionlinejudge.com.br/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "urionlinejudge.com.br")'&gt;URI Online Judge - Login&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;13. ZOJ - &lt;span class="qlink_container"&gt;&lt;a href="http://acm.zju.edu.cn/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "zju.edu.cn")'&gt;ZOJ :: Home&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;14. NTHU - &lt;span class="qlink_container"&gt;&lt;a href="http://acm.twbbs.org/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "twbbs.org")'&gt;NTHU Online Judge&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;15. Leetcode - &lt;span class="qlink_container"&gt;&lt;a href="http://leetcode.com/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "leetcode.com")'&gt;LeetCode&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;16. AI Challenge - &lt;span class="qlink_container"&gt;&lt;a href="http://aichallenge.org/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "aichallenge.org")'&gt;Home | AI Challenge&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;17. Saratov - &lt;span class="qlink_container"&gt;&lt;a href="http://acm.sgu.ru/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "sgu.ru")'&gt;Saratov State University :: Online Contester&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;18. Google code jam - &lt;span class="qlink_container"&gt;&lt;a href="http://code.google.com/codejam" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "google.com")'&gt;Google Code Jam&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;19. InterviewStreet - &lt;span class="qlink_container"&gt;&lt;a href="http://interviewstreet.com/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "interviewstreet.com")'&gt;Programming Contests - Codesprints - Interviewstreet&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;20. Kaggle - &lt;span class="qlink_container"&gt;&lt;a href="http://kaggle.com/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "kaggle.com")'&gt;making data science a sport&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;21. Herbert - &lt;span class="qlink_container"&gt;&lt;a href="http://herbert.tealang.info/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "tealang.info")'&gt;Welcome to Herbert Online Judge&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;22. CoderCharts - &lt;span class="qlink_container"&gt;&lt;a href="http://codercharts.com" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "codercharts.com")'&gt;CoderCharts - Social Meets Programming&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;23. PKU - &lt;span class="qlink_container"&gt;&lt;a href="http://poj.org/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "poj.org")'&gt;Welcome To PKU JudgeOnline&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;24. CodingBat - &lt;span class="qlink_container"&gt;&lt;a href="http://codingbat.com/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "codingbat.com")'&gt;CodingBat&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;25. Programr - &lt;span class="qlink_container"&gt;&lt;a href="http://www.programr.com/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "programr.com")'&gt;Programr | Learn.Code.Share&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;26. HackerRank - &lt;span class="qlink_container"&gt;&lt;a href="https://www.hackerrank.com/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "hackerrank.com")'&gt;Artificial Intelligence Challenges :: AI Programming Problems and Competitions :: HackerRank&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;27. Al Zimmermann - &lt;span class="qlink_container"&gt;&lt;a href="http://www.azspcs.net/" rel="nofollow" class="external_link" target="_blank" onmouseover='return require("qtext").tooltip(this, "azspcs.net")'&gt;Al Zimmermann&amp;#8217;s Programming Contests&lt;/a&gt;&lt;/span&gt;&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/TopCoder/What-are-some-other-sites-that-are-similar-to-Topcoder-Algorithm-section-and-Codeforces/answer/Jayesh-Chaudhary"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/43230073891</link><guid>http://nellaikanth.tumblr.com/post/43230073891</guid><pubDate>Sat, 16 Feb 2013 10:46:26 -0500</pubDate></item><item><title>How exactly does LinkedIn generate the "viewers of this profile also viewed" list of users?</title><description>&lt;p&gt;Answer by Jay Kreps:&lt;/p&gt;&lt;blockquote&gt;This is done via pretty straight-forward analysis of co-occurrence, basically good old-fashioned collaborative filtering. We don&amp;#8217;t give away details on this kind of stuff, but I will explain enough to give the gist.&lt;br/&gt;&lt;br/&gt;Basically when two things happen together it often indicates some degree of similarity. For example two profiles could be viewed by the same person, this indicates that they share some similarity which lead that person to look at both of these profiles. Maybe the person knows both of them, or maybe the person was searching for some trait shared by both of them and they both came up in the results. Obviously you can do this with views of any type of thing, not just profiles. Views are a great kind of co-occurrence: they are very noisy but there are lots of them. But there are many other kinds of co-occurence: a person may apply for two jobs, which indicates a kind of similarity of those jobs; or a single job may be applied for by two people, indicating some similarity of the people. Once you start thinking about co-occurrences they are everywhere: two companies followed by the same person, two products reviewed by the same person, etc. &lt;br/&gt;&lt;br/&gt;So what we have done with this is pretty cool: we take all kinds of co-occurrences we think are relevant for all the types of things on the site (people, companies, jobs, products/services, questions, searches, groups, etc), and create a giant sparse matrix where a particular entry (i,j) is the number of co-occurrences between i and j. (okay actually we generalize this to a tensor since there is an entry for each &lt;i&gt;type&lt;/i&gt; of co-occurrence but that makes it sounds fancier than it is). Then we run our analysis on this matrix to compute sparse pairwise similarities as an &lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Azkaban-job-scheduler"&gt;Azkaban&lt;/a&gt;&lt;/span&gt; workflow made up of a series of big-ass Hadoop jobs.&lt;br/&gt;&lt;br/&gt;So when you start working on this, first you get this connection that all kinds of things can be viewed as &amp;#8220;co-occurances&amp;#8221; and all those co-occurences together can be thought of as a graph where the edges are created by the things co-occuring and this graph has a matrix form, which you can do all kinds of like decompositions on, and then you spend about a week convinced this is like AMAZING and that this could somehow lead you to a deeper truth about the relationships between ALL THINGS and telling everyone at lunch about this until they stop inviting you to lunch. Then you realize you are not smart enough to figure out exactly the deeper relationship of all things, and just go back to giving people interesting things to click on, and fortunately that part works pretty well.&lt;br/&gt;&lt;br/&gt;And of course, what the final similarities we generate &lt;i&gt;mean&lt;/i&gt; is a bit vague. And on their own these are great for some types of recommendation, but not good enough for all types (though it makes a good ingredient in most of them). For example if you want to recommend potential connections to someone, you can use these similarities but on their own they are not a mind-blowing professional connection recommendation. For these deeper semantic recommendations we have specific systems built for each use case on top of some shared infrastructure. The co-occurrence-based results do almost always work well for basic browsing, though; the meaning and messaging is something on the lines of &amp;#8220;here is some other stuff like that.&amp;#8221; The advantage of this type of approach is that it is totally generic and completely agnostic to the type of thing you are recommending.&lt;br/&gt;&lt;br/&gt;If you have gobs of data then the whole problem is pretty simple (more measurement than prediction): you just compute some simple statistics on top of the co-occurrence data and rank the top results for each item. However, if data is sparse (meaning either you have little co-occurrence data or your number of items is huge) then your results will be really noisy and there are some tricks for borrowing strength that can help. We have found that the best trick of all is just to use more data&amp;#8212;instead of running on a few months of data, make it scale out and work incrementally so you can run on a few years of data. For this to help you have to have a way to discount for time&amp;#8212;recent co-occurrences are usually more important than older co-occurrences.&lt;br/&gt;&lt;br/&gt;The are a number of other subtleties. Despite our best efforts and various algorithmic approaches we were always plagued by problems of over-recommending things with very high popularity. For example for the profiles, Barrack Obama kept showing up way too often in people&amp;#8217;s similarity results. And this defeated all our metrics too, because, well, people just like to look at Barrack Obama. We call this the Barack Obama problem.&lt;br/&gt;&lt;br/&gt;This has been a fun project to work on. I think the first version of this at LinkedIn for people was made by &lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Steve-Stegman"&gt;Steve Stegman&lt;/a&gt;&lt;/span&gt; as a 50 line SQL script. I worked on re-writing the early version, to move it to Hadoop, increase the data used, did some de-sparsification, and make it work across the types of entities (not just people). &lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Sam-Shah"&gt;Sam Shah&lt;/a&gt;&lt;/span&gt; and &lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Lili-Wu"&gt;Lili Wu&lt;/a&gt;&lt;/span&gt;  later did a pass on it and made it significantly more sophisticated, incremental and about 10x faster. It has been a fun project because for 7 types of things on the site we compute this for, there are 49 types of similarity recommendations that come out. So it is kind of a product factory. Every month or so someone comes by and asks if we could do a version of &amp;#8220;that recommendation thing&amp;#8221; for showing Xs to Ys and we say yes, but of course it will be hard and take lots of work. A few weeks later we change the job config and tell them it is done. This is always fun.&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/LinkedIn-6/How-exactly-does-LinkedIn-generate-the-viewers-of-this-profile-also-viewed-list-of-users/answer/Jay-Kreps"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/43227133349</link><guid>http://nellaikanth.tumblr.com/post/43227133349</guid><pubDate>Sat, 16 Feb 2013 09:56:07 -0500</pubDate></item><item><title>What is the best way to find the most similar sentence?</title><description>&lt;p&gt;Answer by Yuval Feinstein:&lt;/p&gt;&lt;blockquote&gt;Take a look at the shingle/minhash method, which basically:&lt;br/&gt;&lt;ol&gt;&lt;li&gt;Represents each sentence as a set of shingles (character n-grams).&lt;/li&gt;&lt;li&gt;Calculates min-hashing - several hash functions creating a signature of each shingle set.&lt;/li&gt;&lt;li&gt;Groups together sets of sentences having the same signature.&lt;/li&gt;&lt;/ol&gt;The method is described in the free book &amp;#8220;Mining of Massive Data Sets&amp;#8221; by Anand Rajaraman and Jeff Ullman:&lt;br/&gt;&lt;a href="http://infolab.stanford.e"&gt;http://infolab.stanford.e&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;du/~ullman/mmds.html&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/What-is-the-best-way-to-find-the-most-similar-sentence/answer/Yuval-Feinstein"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/43226924258</link><guid>http://nellaikanth.tumblr.com/post/43226924258</guid><pubDate>Sat, 16 Feb 2013 09:52:17 -0500</pubDate></item><item><title>What does an operating system do in the background when you try to open a file?</title><description>&lt;p&gt;Answer by Robert Love:&lt;blockquote&gt;This is the sort of question one can go deeper and deeper into. Each of the bullets I list below has a dozen bullets between it and the subsequent bullet. Nonetheless, here are the basics that just about any operating system performs when an application asks the kernel to open a file:&lt;ul&gt;&lt;/ul&gt;&lt;ol&gt;&lt;li&gt;The application issues a system call with a name such as &lt;i&gt;open&lt;/i&gt;, providing a path to a file and (perhaps) a requested access modes such as &lt;i&gt;read&lt;/i&gt; or &lt;i&gt;write&lt;/i&gt;.&lt;br/&gt;&lt;/li&gt;&lt;li&gt;The process traps into the kernel.&lt;/li&gt;&lt;li&gt;The kernel begins &lt;i&gt;path name lookup&lt;/i&gt; on the requested file, which consists of walking the filesystem hierarchy and resolving each component in the path. For example, if you requested &lt;div class="codeblock inline_codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;/foo/bar/baz.txt&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt; the filesystem first looks up &lt;div class="codeblock inline_codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;/&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;, then &lt;div class="codeblock inline_codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;foo&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;, and then &lt;div class="codeblock inline_codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;bar&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;. Finally, inside of &lt;div class="codeblock inline_codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;/foo/bar&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;, it tries to find the file &lt;div class="codeblock inline_codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;baz.txt&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;. At any stage, if a path component does not exist, the system call aborts, returning an error to user-space.&lt;/li&gt;&lt;li&gt;The kernel compares the process&amp;#8217;s privileges with those of the file (and perhaps with any of the components resolved in the previous step). Does the process have permission to access this file in the mode requested? If not, the system call aborts, returning error.&lt;/li&gt;&lt;li&gt;Given sufficient access to the file, the kernel now needs to obtain a reference to the actual file, in a manner it understands, beyond the human-readable name. This differs per filesystem. On Unix-style filesystems, &lt;div class="codeblock inline_codeblock"&gt;&lt;pre&gt;&lt;span class=""&gt;baz.txt&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt; is essentially just an entry in a table that maps human-readable names to inode numbers. Inodes are on-disk data structures that contain all of the metadata about a file, as well as links to the actual disk blocks that contain the file&amp;#8217;s data.&lt;/li&gt;&lt;li&gt;With the inode number (or your filesystem&amp;#8217;s equivalent), the kernel obtains a reference to the inode itself. The file is now &amp;#8220;open.&amp;#8221; In a modern kernel with a VFS, this reference includes an object that maps standardized operations such as &lt;i&gt;read file&lt;/i&gt; or &lt;i&gt;write file&lt;/i&gt; to the filesystem&amp;#8217;s implementation.&lt;/li&gt;&lt;li&gt;The reference to the inode is stored in a per-process table, which is called something like the &lt;i&gt;open file table&lt;/i&gt; or &lt;i&gt;file array&lt;/i&gt; depending on the operating system. The process manages the file via a handle or index into this table; on Unix-like systems, the index into the table is a simple integer known as a &lt;i&gt;file descriptor&lt;/i&gt;.&lt;/li&gt;&lt;li&gt;The kernel returns the handle to the process.&lt;/li&gt;&lt;li&gt;The process then uses this handle to subsequently access the file. &lt;/li&gt;&lt;/ol&gt;That&amp;#8217;s the gist of it, anyhow.&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Operating-Systems/What-does-an-operating-system-do-in-the-background-when-you-try-to-open-a-file/answer/Robert-Love-1"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;</description><link>http://nellaikanth.tumblr.com/post/42596652581</link><guid>http://nellaikanth.tumblr.com/post/42596652581</guid><pubDate>Fri, 08 Feb 2013 14:04:56 -0500</pubDate></item><item><title>Is it worth it to take a level or two lower job at Facebook at a decent pay cut to be there while the company dominates/connects the web?</title><description>&lt;p&gt;Answer by Chuck Goolsbee:&lt;/p&gt;&lt;blockquote&gt;I was a VP of Tech Ops at a small, regional colo provider. It was a ton of fun to build a company from near-nothing, to a well-respected and mildly well-known player in the industry. In 2006, and again in 2008 Google tried to recruit me away to their Datacenters - for a lot of reasons irrelevant to your question I stayed where I was - BUT I did have to ask myself the very same question.&lt;br/&gt;&lt;br/&gt;In 2010 Facebook called me - and even *they* asked me that question. It came up in almost every interview: &amp;#8220;are you going to be OK going from VP to a Datacenter Lead?&amp;#8221;&lt;br/&gt;&lt;br/&gt;Honestly, it is kind of a silly question, because I answered it with: &amp;#8220;Today I&amp;#8217;m skating first line on the Chilliwack Bruins, and the Vancouver Canucks just invited me to tryout to be on their third line.&amp;#8221;&lt;br/&gt;&lt;br/&gt;It was a simple choice really. Go to the tryout. If they make an offer, and you like the people you&amp;#8217;ll be playing with&amp;#8230; pack your bag!&lt;br/&gt;&lt;br/&gt;Bottom Line: TITLE is irrelevant. Impact is everything.&lt;br/&gt;&lt;br/&gt;At my old job I was the top dog at a small player with impact for a few hundred customers. Today, my work has impact on one/sixth of humanity. I work in what is the absolute leading edge of technology in my sector - a model of what the rest of the datacenter industry will be doing 15 years from now. best of all, it is open, and we&amp;#8217;ve shared the specs from the servers to the building walls. &lt;br/&gt;&lt;br/&gt;So ask yourself: Titles? Levels? Those are just hooks to hang ego upon. Pay? That is up to you to negotiate. The core issue is: what do you want to get done?&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Is-it-worth-it-to-take-a-level-or-two-lower-job-at-Facebook-at-a-decent-pay-cut-to-be-there-while-the-company-dominates-connects-the-web/answer/Chuck-Goolsbee"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/42596062136</link><guid>http://nellaikanth.tumblr.com/post/42596062136</guid><pubDate>Fri, 08 Feb 2013 13:55:43 -0500</pubDate></item><item><title>What are good starting points (books, tools ) for learning text mining?</title><description>&lt;p&gt;Answer by Vineet Yadav:&lt;/p&gt;&lt;blockquote&gt;&lt;ul&gt;&lt;li&gt;Introduction of information retrieval(&lt;a href="http://nlp.stan"&gt;http://nlp.stan&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;ford.edu/IR-book/) &lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;Mining the web(&lt;a href="http://www.amazon.com"&gt;http://www.amazon.com&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;/Mining-Web-Discovering-K&lt;wbr&gt;&lt;/wbr&gt;nowledge-Hypertext/dp/155&lt;wbr&gt;&lt;/wbr&gt;8607544) &lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;Programming collective intelligence(http://shop.&lt;wbr&gt;&lt;/wbr&gt;oreilly.com/product/97805&lt;wbr&gt;&lt;/wbr&gt;96529321.do) &lt;/li&gt;&lt;li&gt;Statistical natural language processing(&lt;a href="http://nlp.sta"&gt;http://nlp.sta&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;nford.edu/fsnlp/)&lt;/li&gt;&lt;/ul&gt; Tools  and resources &lt;br/&gt;&lt;ul&gt;&lt;li&gt;NLTK:- Nltk book(&lt;a href="http://nltk.googleco"&gt;http://nltk.googleco&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;de.com/svn/trunk/doc/book&lt;wbr&gt;&lt;/wbr&gt;/ch02.html) and nltk cookbook(&lt;a href="http://www.packt"&gt;http://www.packt&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;pub.com/python-text-proce&lt;wbr&gt;&lt;/wbr&gt;ssing-nltk-20-cookbook/bo&lt;wbr&gt;&lt;/wbr&gt;ok) &lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;Mahout:- Mahout in action(&lt;a href="http://manning.com"&gt;http://manning.com&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;/owen/) &lt;/li&gt;&lt;li&gt;GATE:- Text Processing with GATE(&lt;a href="http://www.amazon.co"&gt;http://www.amazon.co&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;.uk/gp/product/0956599311&lt;wbr&gt;&lt;/wbr&gt;/) and GATE documentation(http://gate&lt;wbr&gt;&lt;/wbr&gt;.ac.uk/documentation.html&lt;wbr&gt;&lt;/wbr&gt;)&lt;br/&gt;&lt;/li&gt;&lt;li&gt;Support vector machine(&lt;a href="http://www.cs.cor"&gt;http://www.cs.cor&lt;/a&gt;&lt;wbr&gt;&lt;/wbr&gt;nell.edu/People/tj/svmtca&lt;wbr&gt;&lt;/wbr&gt;tbook/)&lt;/li&gt;&lt;/ul&gt;&lt;/blockquote&gt;&lt;span class="qlink_container"&gt;&lt;a href="http://www.quora.com/Text-Analytics/What-are-good-starting-points-books-tools-for-learning-text-mining/answer/Vineet-Yadav"&gt;View Answer on Quora&lt;/a&gt;&lt;/span&gt;</description><link>http://nellaikanth.tumblr.com/post/42514495770</link><guid>http://nellaikanth.tumblr.com/post/42514495770</guid><pubDate>Thu, 07 Feb 2013 13:16:32 -0500</pubDate></item></channel></rss>
