Tag Archives: distributed

Brief summary of wassup in the tech world…

Everyday after I settle myself down on my medicine ball eating my happy sausage & egg white on a whole wheat toast breakfast, I start reading tech news. It’s important, very. Hell, it’s vital I’d say. By tech news I mean ranging from gadget sites like Gizmodo & Engadget, to Ars Technica, TechCrunch, Mashable and Hacker News. HN is still so far the best one cause the top page contents are generally filtered by rather tech-savvy members.

A lot of articles on HN are very specialized and specific to certain topics. It can range from assessment of certain techniques and patterns, to new framework, library and even micro-optimization in bitwise operations. I’ve learned a ton from those. Basically this blog post is to highlight some of the things I thought was cool, broken down to hopefully non-technical human-readable pieces 🙂

1.  Egor Homakov hacked GitHub



So what is GitHub? For non-geeks, GitHub is a social coding sites. Instead of sharing life dramas, you share codes and cool projects you’ve been working on. It’s incredibly popular and if you’re a developer without a GitHub/BitBucket account, it’s like a designer without a portfolio. Having cool projects on GitHub (cool is defined by the number of forks, a.k.a how many people develop things based on your code, and # of watchers, a.k.a how many people care about your projects) can easily get you a job at any tech company, cause it actually demonstrates your ability to produce good maintainable code, which is the unit of work developers produce.

So GitHub is like Facebook for coder and it’s built on top of a framework called Ruby on Rails (actually just Rails, Ruby is the language). In designing a framework it’s always tricky to measure the amount of customizability you want to offer. There’s no one size fit all, really. And from what I’ve read, there’s been a debate on how Rails enforces whitelisting, or security in general. How much security does a developer want? It’s always been a tough question.

What Egor found has been a know vulnerability in Rails. He posted an issue which got ignored, thus led to his demo on the master branch of Rails project itself. GitHub disabled his account while they were trying to patch it, then enabled it later on, which was ok. You can read more about this right here: Hacker commandeers GitHub to prove Rails vulnerability

2. Sabu, leader of LulzSec got arrested

So Sabu is the leader of LulzSec, which got merged into Anonymous earlier. The group has been conducting a series of DDoS attacks against banks and government websites to protest or sometimes, revenge for certain characters such as the soldier got arrested for leaking to WikiLeaks.

Sabu betrayed Anonymous

First of all, what is DDoS? DDoS is Distributed Denial of Service attack. The key is distributed. A DoS attack means someone floods your website/server with tons of requests (by tons I mean it can go up to billions and more). Since those requests occupy a large chunk of your server’s capacity to serve other users, it will eventually got overwhelmed and shut down. If you bounce/restart it, same thing happens.

Such attack coming from 1 machine is easily blocked due to unique IP address. You can simply blacklist such IP. A distributed one is much harder since it comes from multiple IP addresses running malicious software without users knowing. Such machines can be called zombies.

Now an organization can function the very same way. A normal company today has the board of directors, CEO, CTO, Cxx and such which are decision makers for the company. This can lead to what is called “single point of failure”, which means if the top tier is gone, the company collapses. LulzSec was like that, with a leader, Sabu. Sabu got arrested and LulzSec is gone.

Anonymous, however, isn’t. It is a “distributed” organization meaning there’s no single point of failure. Each subgroup, or even person, functions on his own will to serve the organization’s philosophy, which can be interpreted in any way one can. Therefore, with LulzSec gone, Anonymous isn’t guaranteed to be weaker, since LulzSec might function by itself, still following the philosophy and is in charge of its own operation.

In the computer world this allows an infrastructure to be scaled horizontally by replicating and synchronizing redundant data source. However, humans clearly cannot (yet) replicate ourselves to such an extent that we can backup ourselves to the cloud and such. Anyway, you guys can read the article right here: Sabu betrayed Anonymous

Just some random thoughts… 🙂

Tagged , , , , , , , ,

What stuff I’ve been working on lately

So it’s been a while since my last blog post, partially because I was swamped with work and some personal business, but mainly because my friend Haruki and I are still trying to design and set up the 1st phase of the MetaDB project. It is currently being actively developed right here on GitHub. We’ve put quite some thoughts into the data model of the project, which lead to several architecture and technology choices.

We’re still using NodeJS as the meat of the whole project. It has been the initial choice since the beginning due to various (unverified) reasons of speed, both in performance and development, scalability and lightweightness. I said unverified cause we’ve read a lot about it but technologies are sometimes YMMV kinda thing and for our specific use cases, NodeJS seems to be a good fit.



We dropped the idea of using NoSQL to using PostgreSQL as the database technology. NoSQL is great for unstructured data but there seems to be way too many relationships among our model objects that maintaining a NoSQL model and doing map-reduce just doesn’t seem to be worth it. For NoSQL if we separate some the components then hybrid objects would be a result of a map-reduce instead of a join, which turns out to be definitely not as efficient. If we store those components inside a certain object collection, the reusability would be a big mess. In the end, partially due to our lack of a solid NoSQL modeling skills, we took the easy way out which is SQL.

The framework that drives our API would be a custom in-house developed module called njrpc (Node-JsonRPC). It’s an implementation of the Json-RPC 2.0 protocol with some additional bells & whistles that allow you to do namespacing and interceptors of requests. It also exposes enough low-level calls (at least to our needs) that you can do manual response override in callbacks and such.

Part of my purpose for this blog post is also to share our development set up, mainly for experiments with distributed system and workflows.

Prod environment:

1. For our production environment, Haruki & I both have our VPSes, each of which has an instance of PSQL running and replicating master-slave. Configuring this took a while which we’ll document later but basically it’s running right now.

2. Each of the boxes would also have an instance of metadb-core serving API calls. A simple load-balancer (HAProxy for ex) will be placed on 1 box and will serve as the entry point of all API calls. This does produce an somewhat unpredictable response time for the calls but the tradeoff is redundancy, which is definitely needed for prod env.

3. UI will be set up on 1 of the boxes. It’s really lightweight right now so we don’t immediately see the need of having 2 UIs running.

4. We still have to set up database backup and archive, u know, disaster recovery stuff.

Test environment:

We’re planning to get another smaller VPS instance for our CI server (Continuous Integration). This pretty serves as the integration testing environment for both metadb-core & metadb-ui. Although njrpc is currently set up in Travis-CI, using a 3rd-party CI doesn’t allow us to do some customization and setup. Travis-CI uses RoR and allows testing of NodeJS projects but there’s version skew and database setup and all that. It’d be much less painful to have our own box dedicated to testing.

Development environment:

1. IDE: I actually use cloud9 running locally as my IDE. It doesn’t have code auto-complete and stuff but the syntax hi-lighting and JsHint are pretty decent and helpful. Interface is simple and lightweight enough.

2. Dev environment is pretty much a replicated of prod/test env so it definitely needs PSQL and NodeJS. We also maintain a separate test database with a much smaller set of data so that we can easily wipe out and dump it back in for a fresh copy.

That’s pretty what we have in mind… so much for 2 developers. It’s very time-consuming but rewarding at the same time as I’m getting much better at handling async stuff in JS.

Aight guys, have fun and keep on brogramming! Oh BTW we got our VPS from AlienVPS. They have pretty decent pricing.

Tagged , , , , ,