The Beautiful Mind

Apr 20

What are the best Python scripts you’ve ever written?

Answer by Abhijit Agarwal:

A script to get rank and fees information about CS master’s courses in Europe


I was looking into the idea of pursuing my masters in Computer Science from Europe. I found a website where you could view master courses in different subjects and countries all over Europe.

But the fees is different for students within the EU and students outside of the EU, and the latter was not always included. Moreover, there was no way to see how good a college is considered, because there was no ranking on the page.

So I wrote a script to scrape the results on the website and find it’s fees for outsiders and the QS and ARWU rankings of the University and write all this data into a CSV file.

This script also made the use of Wolfram Alpha API and the Google Search engine. I used BeautifulSoup library for scraping and LXML library for parsing XML outputs by Wolfram Alpha 

Here is a sample output for the keyword “Graphic Design”

It is definitely not my best script but a problem that interested me a lot at the time.
You can look at the code at abhijit148/masterSearch
Please note that even though the code works last time I checked, it is nowhere near what we call “Good Quality” code and I request that you not judge me by how messy it is. :)
You are welcome to contribute to the development of this code and if you do it independently, send me a link when you are done maybe?
View Answer on Quora

What are the best day-to-day time-saving hacks?

Answer by Charudutt Wasnikar:

1. Prioritize
2. Delete unwanted mails / folders on desk
3. Have a Beginning of Day List
4. Have a End of the Day List : Unless you wont have BOD and EOD list, agenda wont be set
5. Have 20% time set for adhoc tasks. If there are no adhic tasks, one can just spend time talking with colleagues, teams, bosses
6. Mark mails as important or not important. Keep a timeline by when you want to complete task and reply the mails
7. Use Diary / Notebook
View Answer on Quora

Jan 27

Sum it Up

techinterview:

Problem: you are given a sequence of numbers from 1 to n-1 with one of the numbers repeating only once. (example: 1 2 3 3 4 5). how can you find the repeating number? what if i give you the constraint that you can’t use a dynamic amount of memory (i.e. the amount of memory you use can’t be related to n)?
what if there are two repeating numbers (and the same memory constraint?)

Read More

Dec 06

Road to Data Science

Post by Adriano Stephan:

Sort of like the Paris metro map for Machine Learning :-) see the full-blown version here: https://qph.is.quoracdn.net/main-qimg-3467d861335eb3a58067278fd83ca2ce
View Post on Quora

Scalable machine learning :: video lectures by Alex Smola

Post by Adriano Stephan:

Scalable machine learning :: video lectures by Alex Smola
View Post on Quora

How can I use Nate Silver’s methods to accurately predict future events?

Answer by Jay Wacker:

1.) gather an immense amount of data
2.) perform relatively simple statistical analyses
3.) add expert knowledge to fix naive use of statistics

The only thing that Nate Silver (and a dozen or so different groups) showed is that the polls were accurate and treating them in a straightforward honest manner gives a more accurate answer than any single poll.

To be fair, Nate Silver corrected for numerous subtleties of the polls having to do with systematic bias, whether possibly intentional or not. They deweighted polls that were historically off or systematically disagreed with other polls.   However, all of this expert knowledge corrections they introduced was independently created by several other groups, showing that standard practices gave the right result.

The key thing about taking this method forward is that without the polls (he had hundreds of polls, each of which with thousands of properly sampled and corrected for demographic populations), Nate Silver had exactly bumpkis, which he would say as well.  The polls were his data.  Getting the data is expensive and hard.  Analyzing it correctly, if not easy, is something that hundreds of thousands of scientists are trained in doing.

So if you have extensive polling, I’d advise averaging them together combining them weighting them inversely proportional to their error.  If you have historical information, identify ways that the polls systematically skew and correct for that.   However, these events are few and far between.
View Answer on Quora

What is a simplified explanation and proof of the Johnson-Lindenstrauss lemma?

Answer by Alex Clemmer:

Why are random projections probably nearly as effective as carefully designed ones?

This question you have asked in the “details” section is the key question of the JL lemma. And the unfortunate truth is that they aren’t. Or rather, they are, but only when your data are all over the place.

To see why, consider this toy example I drew. (Note that this example was picked because we can visualize 3 dimensions, and not because the math of the JL lemma is useful here! It is for intuition only.)

This brings us to our first point: in order to understand a “simplified” version of the JL lemma, we must understand:

What is the JL lemma really saying about projections?

Intuitively, the JL lemma says is this: if you pick a random subspace and project onto it, the scaled pairwise distances between points will probably be preserved.

This will be true regardless of the pointset you have. But you’ll notice that in the example on the right, some planes seem to be “better” than others. In the pointset on the left, you could probably project onto any plane, and it would almost certainly be equally bad. But the data on the right seem to lie close to a plane, so intuitively, the planes “close” to the data seem to be “less bad.”

So on the one hand, the JL lemma is telling us that pairwise distances are probably not distorted. But on the other hand, geometry is telling us that some projections are “better” than others. And this mismatch tells us something interesting about random projections:

So in some sense, the JL lemma seems to work because pairwise distances are not quite as important for dimensionality reduction as we might have hoped. Still, it is interesting that they would be this resilient to random projection, and it seems worth wondering why.

A more formal definition of the JL lemma


First I’ll state the JL lemma, and then demonstrate the intuition using a caricature of a proof, which should give you a good idea why it is true.

Proposition 1 (the JL lemma): For some k \ge O(\log m / \varepsilon^2) (where \varepsilon is our chosen error tolerance), with high probability, map f : \mathbb{R}^d \rightarrow \mathbb{R}^k does not change the pairwise distance between any two points more than a factor of (1 \pm \varepsilon), after scaling by \sqrt{n/k}).

There are a couple of things that are good to notice about the JL lemma that might help you to know when it’s applicable to your problems.

A Caricature of a Proof

[Note: I believe this “caricature” is due to Avrim Blum, but I can’t find it, and so can’t be sure.]

The gist of the proof is this: the JL lemma follows from the fact that the squared length of a vector is sharply concentrated around its mean when projected onto a random k-dimensional subspace. What we aim to show in this section is why this would even be true or useful for proving the JL lemma.

We will begin as something that doesn’t really resemble the JL lemma, but by the end it will hopefully become clear what the relationship is.

First, let’s say we randomly sample m points that lie on the surface of a d-dimensional sphere. These points can also be seen as random unit-length vectors.

We would like to look at how individual coordinates behave. So we will simplify the picture somewhat.

Since they lie on a d-dimensional spere, they are points in \mathbb{R}^d. But, notice that, regardless of the size of d, we can say that these points lie in an m-dimensional subspace, because there are only m points. So we’ll say that they lie in \mathbb{R}^m.

Let’s start by looking at the first coordinate. By standard probability theory, we know that \mathbb{E}[x^2_1] = 1/m. Note that as m gets bigger, the concentration rises precipitously, meaning that the actual values of the coordinates will be sharply concentrated around this value.

Now we’d like to extend this intuition to look at all of the coordinates. Since the value of these coordinates will be sharply concentrated around 1/m for large m, we can see that the coordinates kind of sort of look iid. They’re not really, because if one coordinate is large, the others are necessarily small, but this “sharpness” means that they are “almost” iid. This is not a real proof, right? So let’s say they’re iid for illustrative purposes.

If they’re iid, then we can apply our favorite Chernoff/Hoeffding bound to say that, with really high probability, all the coordinates will be really really close to being 1/m in size. For example, p(|(x_1^2 + \ldots + x_k^2) - k/n| \ge \varepsilon k/n] \le 1/(e^{O(k \varepsilon^2)} ). Remember, this is if they’re iid, which of course they aren’t, but they kind of look like they are. This is intuition.

At this point we’re ready to look at random projections. The basic idea is, we’re going to take a random plane and project our unit vectors onto it. Here’s are a series of “random” examples that I made up (note that the plane of projection is translated to the origin).
But it turns out that we can look at vector projections in a more interesting way. Projecting from \mathbb{R}^m to \mathbb{R}^k is basically the same thing as randomly rotating the vector, and then reading off the first k coordinates.

Now we get to the JL lemma: in order for the JL lemma to be basically true, the distance between the two vectors must be almost the same after we scale it accordingly! In the picture above, the projected vectors each have a dotted line going between them, depicting the distance. So, in other words, that vector needs to be almost the same scaled length as it would be in the original example.

Unsurprisingly, it turns out that this is true. Here’s the math that justifies it.

Using the above Chernoff-Hoeffding bound, we see that at k = O(\frac{1}{\varepsilon^2} \log n), then with probability 1 - O(n^p) (for almost whatever positive integer choice of p you like), the projection to the first k coordinates has length \sqrt{k/n} \cdot (1 \pm \varepsilon).

Now, let’s look at the distance between some pair of vectors. Let’s say our regular non-projected vectors are called \vec{v}_1 and \vec{v}_2. Then the vector that represents the dotted line between the original vectors would be \vec{d} = \vec{v}_2 - \vec{v}_1.

And here’s the punch line. We can take that “distance vector” that goes between our original vectors, and use the same projection argument as above. Thus, with high probability (by the union bound), the length of all these “distance vectors” \vec{d} project to length \sqrt{\frac{k}{n}} \cdot (1 \pm \varepsilon) ||\vec{d}||_2.

Toward more general arguments.

So that’s the intuition for why the JL lemma is true. As we said before,  the  JL lemma follows from the fact that the squared length of a vector is  sharply concentrated around its mean when projected onto a random k-dimensional subspace. This last section, I hope, explains roughly both why this would help us prove the JL lemma, and is a beginning on why it’s true on a more general basis.

If you want to know more, I hope this arms you nicely to figuring out the proofs floating around out there. I would say that Alexandre Passos's answer is a reasonable start. An excellent treatment exists in Foundations of Machine Learning, by Mohri et al. It’s an excellent book anyway, and you should read it just because.
View Answer on Quora

What are some good Undergraduate projects in Quantum Computing and algorithms?

Answer by Igor Markov:

Figure out how Shor’s algorithm works and implement a simulator for it in Java or C++. Then try to reduce the size of the circuit by various modifications and see how this affects the result of the algorithm (there are some modifications that would only slightly degrade the performance).

Figure out how the stabilizer formalism (Heisenberg representation) works and implement a simulator for stabilizer circuits, following the work of Daniel Gottesman or his 2004 paper with Scott Aaronson.  Find recent papers by Richard Jozsa on arxiv that propose a different (simpler) simulation technique, implement it, and compare empirically to the original algorithm in terms of runtime and memory usage.

Given a quantum circuit (either a circuit design or a physical device - different cases), how do you check that it does what it should do? If it misses one gate, can you find which gate is missing ? There isn’t much literature on this, but you can find a few reasonable papers.
View Answer on Quora

What is the most efficient way for a programmer to get good at algorithms without participating in programming competitions?

Answer by Igor Markov:

First, you need to be clear on the syntax of the language you are using. Even when you plan to focus on CS theory, it seems impractical to focus on algorithms w/o having a realistic language to express them (and use pseudocode only).  When teaching algorithms with C++, I see a lot of students who don’t understand pointers, arrays, references, templates, etc in C++, and this limits what they can do in practice (i.e., in projects).

Second, you need to understand standard algorithms.
The Cormen-Leiserson-Rivest-Stein book (3rd+ ed) is good for this (but does not explain how to implement anything in a particular language). Knuth’s TAOCP is the nuclear option for studying algorithms - even if you finish, it will take many years (also note that Knuth is still working on new volumes).

Third, you need to understand how to implement and use standard algorithms with a given language, e.g., using the STL in C++. The book by Josuttis on the C++ standard library (2nd ed) is excellent. You may also get a book on algorithms that shows C++ implementations, such as the one by Weiss (3rd ed). There are other sources that use Java or C#. However, experience shows that students who went through a C++ based course can migrate to Java and C#, but not as easily the other way around.

Fourth, practice algorithmic problem solving. This is harder, but the CLRS book has a number of great exercises. Solve them, implement them, and test your solutions. There are also books with algorithmic/programming problems, including interview questions. Should be easy to find on amazon. But becoming good at algorithmic problem-solving will take time.

Fifth, algorithm analysis. The CLRS book is also good for this.

Sixth, large-scale algorithm design and development. This comes with experience, including a lot of reading (research papers), trying out different ideas (research projects) and/or graduate studies. Probably beyond the scope of your question.

Update: in teaching algorithms and data structures (e.g., in the current semester at the University of Michigan), an extremely efficient tool is an autograder. Students submit their project online and get the results of testing on hidden testcases within 10 minutes. This provides a feedback loop that is just not possible otherwise. Sometimes students cannot get the right output, sometimes their algorithms/programs are too slow. However, when they know just how good /poor their solution is, they can focus on improving their solution. If you are not taking a course at a university, you can get a similar environment in online programming competitions in training mode. Google Code Jam would be one of those.
View Answer on Quora

What data structures/matching algorithims does vimdiff use?

Answer by Anirudh Joshi:

Diffs are part of the Longest common subsequence problem (http://en.wikipedia.org/wiki/Longest_common_subsequence_problem). This is an NP-hard problem that basically tries to find continuous sequences in arbitrary data - i.e. many source files/DNA shot gun sequencing.

However diffs only operates on 2 sequences reducing the complexity of the search to O(m*n) with m/n being the length of the files (thanks to Peter Scott).

I’m pretty sure vimdiff uses the standard diff tool (http://en.wikipedia.org/wiki/Diff#Algorithm) to attack this problem:

This only works when a standard “diff” command is available.  See ‘diffexpr’.

Source: http://vimdoc.sourceforge.net/htmldoc/diff.html

On Unix-based systems, Vim should work without problem because there should be a “standard” diff program available

Source: http://vim.wikia.com/wiki/Running_diff

Diff mainly uses the algorithm described in the paper, An O(ND) Difference Algorithm and its Variations (http://www.xmailserver.org/diff2.pdf), found from http://stackoverflow.com/questions/805626/diff-algorithm.

Other references are included in the source code found here http://ftp.gnu.org/gnu/diffutils/ :

The basic algorithm is described in “An O(ND) Difference Algorithm and its Variations”, Eugene W. Myers, ‘Algorithmica’ Vol. 1 No. 2, 1986, pp. 251-266; and in “A File Comparison Program”, Webb Miller and Eugene W. Myers, ‘Software—Practice and Experience’ Vol. 15 No. 11, 1985, pp. 1025-1040. The algorithm was independently discovered as described in “Algorithms for Approximate String Matching”, E. Ukkonen, `Information and Control’ Vol. 64, 1985, pp. 100-118.

Code for various languages can be found at http://code.google.com/p/google-diff-match-patch/

More discussion here: http://c2.com/cgi/wiki?DiffAlgorithm
View Answer on Quora