So it’s been a while since my last blog post, partially because I was swamped with work and some personal business, but mainly because my friend Haruki and I are still trying to design and set up the 1st phase of the MetaDB project. It is currently being actively developed right here on GitHub. We’ve put quite some thoughts into the data model of the project, which lead to several architecture and technology choices.
We’re still using NodeJS as the meat of the whole project. It has been the initial choice since the beginning due to various (unverified) reasons of speed, both in performance and development, scalability and lightweightness. I said unverified cause we’ve read a lot about it but technologies are sometimes YMMV kinda thing and for our specific use cases, NodeJS seems to be a good fit.
We dropped the idea of using NoSQL to using PostgreSQL as the database technology. NoSQL is great for unstructured data but there seems to be way too many relationships among our model objects that maintaining a NoSQL model and doing map-reduce just doesn’t seem to be worth it. For NoSQL if we separate some the components then hybrid objects would be a result of a map-reduce instead of a join, which turns out to be definitely not as efficient. If we store those components inside a certain object collection, the reusability would be a big mess. In the end, partially due to our lack of a solid NoSQL modeling skills, we took the easy way out which is SQL.
The framework that drives our API would be a custom in-house developed module called njrpc (Node-JsonRPC). It’s an implementation of the Json-RPC 2.0 protocol with some additional bells & whistles that allow you to do namespacing and interceptors of requests. It also exposes enough low-level calls (at least to our needs) that you can do manual response override in callbacks and such.
Part of my purpose for this blog post is also to share our development set up, mainly for experiments with distributed system and workflows.
1. For our production environment, Haruki & I both have our VPSes, each of which has an instance of PSQL running and replicating master-slave. Configuring this took a while which we’ll document later but basically it’s running right now.
2. Each of the boxes would also have an instance of metadb-core serving API calls. A simple load-balancer (HAProxy for ex) will be placed on 1 box and will serve as the entry point of all API calls. This does produce an somewhat unpredictable response time for the calls but the tradeoff is redundancy, which is definitely needed for prod env.
3. UI will be set up on 1 of the boxes. It’s really lightweight right now so we don’t immediately see the need of having 2 UIs running.
4. We still have to set up database backup and archive, u know, disaster recovery stuff.
We’re planning to get another smaller VPS instance for our CI server (Continuous Integration). This pretty serves as the integration testing environment for both metadb-core & metadb-ui. Although njrpc is currently set up in Travis-CI, using a 3rd-party CI doesn’t allow us to do some customization and setup. Travis-CI uses RoR and allows testing of NodeJS projects but there’s version skew and database setup and all that. It’d be much less painful to have our own box dedicated to testing.
1. IDE: I actually use cloud9 running locally as my IDE. It doesn’t have code auto-complete and stuff but the syntax hi-lighting and JsHint are pretty decent and helpful. Interface is simple and lightweight enough.
2. Dev environment is pretty much a replicated of prod/test env so it definitely needs PSQL and NodeJS. We also maintain a separate test database with a much smaller set of data so that we can easily wipe out and dump it back in for a fresh copy.
That’s pretty what we have in mind… so much for 2 developers. It’s very time-consuming but rewarding at the same time as I’m getting much better at handling async stuff in JS.
Aight guys, have fun and keep on brogramming! Oh BTW we got our VPS from AlienVPS. They have pretty decent pricing.