!geek

Visual simulation of consistent hashing

2021-09-22T06:02:44+05:30

Caching is an important aspect of the high performance applications. As the data volume increases, the cached data needs to be distributed across multiple servers. We need to make sure the following objectives are met while doing so.

Maximize the cache hits: This will reduce the load on primary data source and reduce the overall latency.
Distribute the data and traffic evenly: This ensures optimal use of servers and avoid overloading a subset of the servers.

As the title of this post suggests, we will look into how consistent hashing can be used to achieve the above objectives. Before that, let’s look at using a straight forward approach for solving the problem.

Modulo Hashing

In this approach, we hash the requests based on a key and use the formula hash(key) % number_of_servers to route the request to appropriate cache server. For example: If a key “apple” hashes to 14 and we have 3 cache servers, the request for “apple” will be forwarded server number 2 as 14 % 3 = 2.

Let’s simulate this for 3 cache servers, 100 unique keys, 300 random requests and see how it performs.

Click the play button below and check the stats. Increase the speed to fast forward.

Let’s analyse the results

Cache Hit Ratio: As expected, 2/3rd of the requests(67%) are returned from the cache. This is good.
Data and load distribution: The load is not equally distributed across all servers, but it is fairly distributed. It is purely based on the distribution property of the hash function.

Modulo hashing works well for a fixed number of servers. But in many cases, we need to add or remove servers as per the variation in traffic volume. And, servers can crash sometimes. Let’s simulate the following dynamic nodes scenario with modulo hashing and see how it performs.

3 servers(S0,S1,S2), 100 unique keys, 300 random requests as earlier
1 server(S3) is added after the 100th request
1 server(S1) is removed after the 200th request

Let’s analyse the results

Cache Hit Ratio: This drops from ~67% earlier to ~45%.
Data and load distribution: The total number of keys across the servers has increased. This indicates some keys are stored in multiple servers.

This is because, many keys are mapped to a different server when the number of servers change. For example: a key “orange” with hash value 11 is initially routed to server S2 when there are 3 servers(11 % 3 = 2), whereas it is routed to server S3 when there are 4 servers(11 % 4 = 3). This leads to ineffective use of cache.

Consistent Hashing

Consistent Hashing has a different approach to address the drawbacks of the modulo hashing with dynamic nodes. Let’s start with the basic concepts of consistent hashing.

The servers(called as nodes) are hashed and mapped to a number in a fixed range. This range can be imagined as a real number(not an integer) between 0 and 360 to represent it as points on a circle. The node is placed on this circular ring based on its hash value mapped to range 0-360. For example: If hash value of server S1 maps to 90, it will be placed as point at 90 degrees on the circumference of the circle.
The keys are hashed similarly and mapped to a point on the circle. The request for this key is routed to the closest node in the clockwise direction (It can be anticlockwise as well, as long as the same direction is used for all the keys).

Let’s play this simulation in 0.5x speed and visualise this basic concept.

Now that we understand the basic concept, let’s run the simulation and observe the stats for 3 servers, 100 unique keys, 300 random requests. (Please increase the simulation speed when required to fast forward to the final stats)

We can observe that, cache hit ratio and load distribution is very similar to that of modulo hashing. This is expected as the algorithm behaves almost the same for fixed number of servers.

Let’s see how this basic concept of consistent hashing handles the addition and removal of the nodes using the below scenario

3 servers(S1,S2,S3), 100 unique keys, 300 random requests as earlier
1 server(S4) is added after the 100th request
1 server(S1) is removed after the 200th request

Let’s analyse the results

Cache Hit Ratio: The cache hit ratio is ~60% compared to ~50% in modulo hashing. This is because very few keys are remapped when a node is added or removed.
Data and load distribution: The load distribution is skewed. In the above scenario(with the chosen node names), the new node S4 doesn’t get many requests due to its proximity to node S3.

Consistent hashing solves this load distribution problem by placing each node at multiple points on the ring. These points are called as virtual nodes. For example, if we need to represent node S1 as 4 points on the ring, we place virtual nodes S1-1 to S1-4 on the ring using same logic as earlier. This allows multiple small fragments of the ring to be mapped to a single node.

Let’s simulate the previous elastic nodes scenario with 12 virtual nodes per node.

Let’s analyse the results

Cache Hit Ratio: The cache hit ratio remains good i.e. ~60% compared to ~50% in modulo hashing.
Data and load distribution: The load is distributed a lot better now. The new node S4 gets a fair amount of traffic compared to earlier. This is because node S4 is mapped multiple fragments of the ring, increasing its chance of fair share of traffic.

If you would like to simulate your own scenarios, please modify this JSBin code and run own experiments.

Conclusion

Consistent hashing has proven to be a useful technique since its inception in 1997 and it is used in many well known distributed systems because of the simplicity and the benefits it offers. The optimization of consistent hashing does not end with what we have read so far. For example: checkout this blog or the video by vimeo engineering on their practical usage and adaptation.

Meta: You can find the the code used for the above simulations here.

Acknowledgements

The idea of visual simulation was inspired by the amazing interactive posts like this by @bciechanowski

Cloju-re-alization

2019-02-13T20:22:19+05:30

This is a story of how learning clojure made me realize one of its selling points through subconscious thinking.

I had tried to learn clojure few times in the past and had dropped it because of the prefix notation for expressions and parentheses black-hole. Writing (def y (+ (* m x) c) felt very weird after expressing it as y = m * x + c during many years of education. I wasn’t alone, many of my colleagues had similar feeling about clojure and pure functional programming languages.

Some of us attended a training by clojure evangelist and expert who recommended to keep an open mind in the beginning and learn the concepts by solving few basic problems using clojure. As part of this exercise, we wrote a function to find factors of a number which made me curious about extending it to find prime-factors of a number. My mind automatically started thinking of imperative programming approach and translating it to functional style in clojure.

After few minutes of dabbling, it seemed like a very hard(close to impossible) problem to solve in clojure. It made a dent in my confidence and I needed a fix. I decided to write it in imperative style first and quickly came up with below python program to regain part of my confidence :)

def get_prime_factors(n):
    prime_factors = []
    for i in range(2, n+1):
        if(n % i == 0):
            prime_factors += [i] + get_prime_factors(n/i)
            break
    return prime_factors
# not the most optimal solution, but proved I can code function to find prime factors :)

As it was close to end of the day(& week), my mind dropped this problem there. But my subconscious mind hadn’t let go of this problem and started giving me this hint at the end of a good night sleep. I needed to breakdown the problem into smaller abstractions i.e prime factors of number is smallest prime factor of the number and prime factors of the quotient. This is the code I wrote in clojure after this sudden flash of thoughts

(defn divisible? [n, x]
  (zero? (rem n x))
  )

(def find-first (comp first filter))

(defn smallest-prime-factor [n]
  (find-first #(divisible? n %1) (range 2, (+ n 1)))
  )

(defn prime-factors [n]
  (if (<= n 1)
    ()
    (let [smallest-prime-factor (smallest-prime-factor n)]
      (concat [smallest-prime-factor] (prime-factors (/ n smallest-prime-factor)) )
      )
    )
  )

This made me realise, the imperative code I had written earlier was more complex to digest the concept of prime factors when compared to the clojure code. If I have to write this function again in any other language, I would definitely break it down to abstractions defined in the clojure function above. In a way, clojure was making it harder to write bad code!

Well, this was my #cloju-re-alization. What is yours? Leave a comment below or write your own blog post and share

Monitoring DB backups using prometheus

2017-12-22T14:19:02+05:30

Ensuring correct database backups taken at regular interval is very critical for disaster recovery. Recent gitlab database incident re-emphasizes this fact. Gitlab was very transparent about this and documented approaches for preventing these failures

The preventive measures include, monitoring -

Backup file is created in every x interval: Catches backups not being uploaded due to backup script error or scheduling error
Size of latest backup file is at-least y bytes: Catches erroneous backup file uploaded due to script error

In our case, DB backups are uploaded to Azure blob storage(similar to AWS S3) and prometheus is used for monitoring

High level design

Run an exporter which can expose metrics such as latest_file_timestamp and latest_file_size for each blob container where backup files are uploaded
Alert if current_time - latest_file_timestamp > backup_interval or latest_file_size < expected_backup_file_size

As we couldn’t find any existing exporter, we wrote prometheus-azure-blob-exporter to capture following metrics

# Last modified timestamp(milliseconds) for latest file in container
azure_blob_latest_file_timestamp{container="postgresql-backup"} 1508707802000.0
# Size in bytes for latest file in container
azure_blob_latest_file_size{container="postgresql-backup"} 5606443.0

Alerts are defined as

Check backup is created every day

# 24 hours + 1 hour slack time for backup process
# threshold_interval_in_milliseconds => 25 * 60 * 60 * 1000 => 90000000
ALERT backup_is_too_old
  IF (time() * 1000) - azure_blob_latest_file_timestamp > 90000000
  FOR 5m
  ANNOTATIONS {
      summary = "Backup is too old",
      description = "There is no backup file created for a day in ",
  }

Check latest backup file created has minimum size of 1MB

# threshold_size_in_bytes => 1MB =~ 1000000
ALERT backup_size_is_too_small
  IF azure_blob_latest_file_size < 1000000
  FOR 5m
  ANNOTATIONS {
      summary = "Backup size is too small",
      description = "Latest backup file is smaller than 1MB in ",
  }

Please checkout the github repo for more details

To cron or not to cron

2017-12-22T12:16:01+05:30

Using crontab for scheduling background jobs like database backup, purging old data is a common practice in lot of IT organizations. As these jobs are running in background, failures can get missed easily without proper monitoring. In most cases, failures occur due to an error in job script and some times due to mistakes in cron configuration

Monitoring short lived cron jobs are not straight forward compared to monitoring long running services like web services. These are few well-known mechanisms for alerting cron job failures

Cron job writes to a file on success. Monitoring system regularly reads this file and alerts based on last modified time-stamp of the file -[Refer]
Cron job pings a monitoring service on success. This monitoring service acts as a dead man’s switch and alerts if there was no call made by the job within expected time interval

Pragmatic alternative for crontab

If you have a CI server like jenkins in your infrastructure, one of the approach that has worked well for us is - create a scheduled job in jenkins with slack/email notifications on failures.

Advantages of this approach

Simpler to setup and effective for catching job script failures
UI to check job output and history of previous runs
Ability to manually trigger the job when needed

Important note

Whether you use scheduled jobs in crontab or jenkins, you shouldn’t depend only on the job’s exit status for determining success - [Refer]. It is important to have alerting based on expected state of the system after job execution. Example: Timestamp of latest backup file uploaded to backup storage, minimum size of the backup file etc

UPDATE: Checkout Monitoring DB backups using prometheus for more details on monitoring DB backups

Docker swarm in production

2017-12-22T09:58:33+05:30

This is a experience report on using docker swarm(17.06) in production for ~3 months

Context

Prior to this, we were deploying services on VMs using ansible. For a new project, we wanted to explore benefits of running service as containers and decided to use docker swarm due to its simpler setup and consistency with docker engine APIs

Benefits

Using container orchestration engine to run services provided following benefits

Cleaner boundary between Developers building/packaging the services and DevOps team providing platform for running these services. It is a great cultural shift for the organization!
Consistent way to operate services - start/stop, find logs, scale. This makes it easier to operate
Implementing cross cutting concerns like log aggregation, monitoring gets easier due to consistency
Promotes stateless & immutable infrastructure. It is easier to setup new environments & tear down as and when needed
You can achieve better resource utilization with same level of isolation compared to running services on VMs. Helps overcoming resource management issues when running services in public cloud VMs - which doesn’t allow fine gained control on customizing memory, CPU and network resources for a service

Docker swarm specific

Easy to setup & operate - single binary/service has all the bells and whistles
Smaller learning curve due to API consistency with the standalone docker engine
Built in service discovery(via DNS)
Built in load balancing across service replicas using battle tested Linux services like IPVS. Load balancing is based on health of the instance determined by health checks defined for the service
Support for distributed configuration and secrets
Support for rolling upgrades

Challenges

Sometimes you’ll run into issues moby#34163, moby#25432 during deployments. Having a play-book will help in faster recovery from these failures
Issues in overlay networking(swarm#2161) can make few containers unreachable some times. Having good monitoring is essential for identification and remediation of these issues
Rolling update is harder to achieve if you are using docker config/secret. Swarm doesn’t support updating config/secret. You would have create new config(with new version number) & update the service to use new version of config. It would have been good if this was handled internally using checksum of the config

Conclusion

Overall, using a container orchestration engine in production has proven to be very productive and useful. Docker swarm is maturing over time, with improved stability it is a promising platform for running containers in production

Joy of finding a math series

2017-10-28T12:46:47+05:30

I was trying to come up with a question for preliminary logic round for interviewing. I decided to form a tricky math series and tried to figure out 2 math series which looks same in the beginning and diverges after a while. Something on the lines of

series x => x1, x2, x3, x4, x5
series y => y1, y2, y3, y4, y5

where x1=y1, x2=y2, x3=y3 and x4!=y4, x5!=y5

So the question could be

Write the missing number in below series
x1, x2, x3, ? , y5

To figure out these series, my approach was to start with some series x and find a pattern which doesn’t hold good after few numbers. After few unsuccessful attempts, tried my luck with series of squares

1, 4, 9, 16, 25, ...

Voilà! difference of consecutive squares were not only a odd number, they were consecutive odd numbers. Tried it for a long series and found it true for all the squares!

1-0, 4-1, 9-4, 16-9, 25-16,... => 1, 3, 5, 7, 9,..

From what I knew of squares of numbers, this wasn’t very obvious. I was very excited to check if this is already known or I was the first one to discover this :). Simple web search presented ton of articles on this observation.

That was disappointing but reading this link about a mathematician appreciating a student’s effort in similar finding was comforting and encouraging to carry on with my quest

Evolution of monitoring systems

2017-10-21T19:54:38+05:30

Monitoring systems have evolved over time to support evolving architecture styles and deployment strategies

Generation X+1: Monolithic applications, Bare metal servers

This was the era where you would have single deployable unit for whole application (labeled as monolithic now). Application and the database would be deployed on a known set of bare metal servers. Vertical scaling was more favorable option for scaling

Nagios was(probably still is) the widely used open source monitoring software in this era. You would configure a list of known severs the monitoring system needs to probe to determine the health of the system. This system was simple to configure & operate

Image Credit: https://support.nagios.com/kb/article.php?id=141

Generation X+2: Service Oriented Architecture, VMs on the Cloud

Managing monolithic applications became painful as the businesses grew. Service Oriented Architecture(SOA) became mainstream in this era. Cloud services like AWS made it very easy to launch new VMs for deploying services. Configuration management tools like puppet, ansible made it easy to deploy applications across large number of servers. Immutable server pattern was evangelized by well known companies.

Horizontal scalability started becoming more favorable option for scaling. One could bring up new instances of service by launching VMs from a image(eg: AMIs) and bring down VMs easily depending on the load on the system. This deployment architecture demanded a monitoring system which can handle this dynamicity.

Monitoring systems like Sensu solved this issue with a different architecture style. Instead of a central server probing a static list of servers, sensu had a publish & subscription model using RabbitMQ. The monitoring agent running on the application server subscribes to monitoring check messages relevant for the service and pushes the results via same messaging system. Monitoring server itself was horizontally scalable to handle varying load

Image Credit: https://sensuapp.org/docs/1.0/overview/architecture.html

Generation X+3: Micro services, Container orchestration engines

Micro services are becoming mainstream now. Deploying stateless services as containers has made the concept of immutable servers very easy and efficient compared to using VMs. Container orchestration engines like kubernetes, docker swarm have made it easy to run and manage large number of services running as containers inside a cluster of servers. These orchestration engines also provide benefits like auto healing(restart on failure), easy to scale, in-built service discovery & load balancing.

Monitoring systems like sensu doesn’t fit well in this setup. Running a monitoring agent alongside the application process in a container adds additional complexity in containers. As service discovery is provided by container orchestration engines, there is no need for adding complexity of running messaging system for solving discovery problem

During the time when Google open sourced kubernetes(most popular container orchestration engine as of now), a new monitoring system - Prometheus (built by ex-googlers in soundcloud) started gaining lot of traction. Prometheus leverages service discovery mechanisms for registering services to be monitored. It has a lot simpler setup and smaller resource requirements compared to system like sensu. Prometheus factored the containers ecosystem and fits very well for the job. The scalability argument against pull model of monitoring was also addressed by the authors of the system.

Image Credit: https://prometheus.io/docs/introduction/overview/

Conclusion

If you are building a new system with architecture patterns and deployment strategies of this era, Prometheus is a leading choice among open source monitoring systems

PS: There are lot of good things(and few limitations) about prometheus which deserves separate blog post :)

Simple obstacle avoiding robot using arduino

2016-01-02T16:25:48+05:30

Building obstacle avoiding robot is a simple & fun way to start learning arduino and electronics. A lot of useful articles explain this, but you will be blocked if you can’t get the same parts in your region. In this post, I’ll explain how to build a simple and minimal robot using the parts available online in India.

Watch the below video to get an idea of what you could build by following this article.

High level approach

We’ll be using a sensor to detect an obstacle in front of robot. Depending on the sensor input we’ll control the motor wheels of robot to either move forward or turn aside.

Prerequisites

Arduino UNO or its clone
HC-SR04 Ultra sonic sensor
Robot chassis + 2 DC Motors with holder + 2 Wheels + 1 Castor Wheel + Screws & Nuts
L293D motor driver
Basic electronics kit contains breadboard, connecting wires, battery & other small useful items
Jumper Wires: Male to Male, Male to Female
Power source: 5V power bank or 5V battery pack (I used a phone charger power bank I had already)
Tools: Screw driver, Scissor / Wire stripper
Softwares: Arduino IDE

Steps

Follow the steps in the order below. If you get stuck at any point, refer to the troubleshooting section.

Step 1 : Get to know arduino and electronics basics

If you are new to arduino, try out few basic examples

blinking led to learn controlling output
push button to learn reading input & basic serial output for trouble shooting
How to use breadboard (vertical power rails vs horizontal terminal strips)

Step 2: Connect ultrasonic sensor to detect obstacle

2.1: Connect the circuit as shown below

2.2: Add NewPing library to read distance.

Download the NewPing library as zip here
In Arduino IDE go to menu Sketch -> Include Library -> Add .ZIP as library -> Choose the downloaded zip file

2.3: Upload the below code to arduino

2.4: Test the distance measured by sensor

In Arduino IDE go to menu Tools -> Serial Monitor
Move an obstacle in front of sensor and test that correct distance is printed on serial monitor

Step 3: Assemble the robot chassis and test the motors & wheels

The robot chassis needs to be assembled by looking at the images in purchased website. Bare minimum you should have the chassis base assembled with motors and wheels
Cut 4 pieces of connecting wires each of length 15-20 centimeters and strip out both ends of the wire 1-2 centimeters
Take 2 wires & connect one end of each wire to 2 connection slots of the motor. Repeat the same with other 2 wires using 2nd motor
Take out a battery (it will be in basic electronics kit) and test motor wheel rotation by connecting other ends of the wires to battery

Step 4: Program L293D motor driver to control the wheels

4.1: Connect the L293D motor to arduino

I couldn’t get a fritzing diagram for this circuit since I don’t have the SVG for the part purchased. I’ll list the connections needed to get it running

Connect 2 wires from one motor to 2 slots in M1 and other 2 wires from other motor to slots in M2
Connect 5V and 12V on L293D to power line(Red) on breadboard
Connect GND on L293D to GND line(Black) on breadboard
Connect C1-A, C1-B on L293D to pins 2 & 3 respectively on arduino. These are motion control pins for Motor 1
Connect C2-A, C2-B on L293D to pins 7 & 8 respectively on arduino. These are motion control pins for Motor 2
Connect EN1 and EN2 on L293D to power line(Red) on breadboard. These are enable pins for Motor 1 and Motor 2 respectively. HIGH voltage(power) enables the motor and LOW voltage(GND) disables the motor

The connection should look somewhat like the image below

Please refer to images in purchased website for additional technical details

4.2: Upload the below code to arduino

4.2: Test the wheel movement control

The loop function initially tests the move_forward function. Check both wheels are moving in forward direction.
If any of the motor wheels are rotating in wrong direction swap the connections in M1 or M2 depending on malfunctioning motor
Change the line in loop function to and replace move_forward with other methods like drive_backward, turn_left, turn_right to test they work as expected

Step 5: Integrate sensor input to control the bot movement

Upload the below code which has logic to move the robot depending on distance of the obstacle. Feel free to change the obstacle distance or delay or the rotation angle as per your motor speed

Step 6: Test it in the field and celebrate!

So far you would have used USB from your laptop as power source for arduino. Now you should use a power bank or 5V battery pack to test your robot in the field
If everything works, run around…jump up & down…shout to the sky…any other way to celebrate the hard earned victory :)

Step 7: Enhancements

These are optional enhancements that I wanted to try but didn’t an opportunity to try

The robot speed can be increased by connecting a 12V battery as power source for L293D
The accuracy of obstacle detection can be increased by using couple of IR sensors at corners OR by using a servo to rotate the ultrasonic sensor for reading obstacles in the corners
Attach leds, buzzers, laser, add a custom body to make it look like WALL-E or other funky stuff to add more jazziness the vehicle
Send it to Mars or Do anything else you can imagine to save or end the human race

The code and fritzing diagram is also shared on github repo

Troubleshooting

Double check the pin number you have connected to on arduino. You might have connected to to the wrong pin because of the angle at your are looking at these circuits. Count the pin numbers slowly to ensure you are connected to pin number x and not x + 1 or x - 1
Double check the breadboard slots in which you have plugged in the wires or devices. Push the wires or devices to ensure they are plugged in firmly
Test if the breadboard slots to which you have plugged-in works properly. Use a buzzer or other small equipments which can handle 5V to to test them out (Don’t use led without a resistor). I had an issue where some of the pin slots in in power rails were not working due to defective board
Test the components such as wires, sensors, motors individually and ensure they work properly when used independently
If nothing helps, google is your friend

Soft delete and unique constraint

2015-10-19T17:59:41+05:30

This post describes a robust solution and other alternatives for having unique constraint at DB level for a table with soft deleted rows.

Problem Context

The system identifies the users by their mobile number and hence mobile number must be unique across users. The users are soft deleted in the system by updating column deleted = 1. A new user can register in with same mobile number as previously deactivated user (since mobile numbers are recycled by telecoms). The unique check at application are susceptible to fail in case of concurrent requests, unique constraint is needed at DB to ensure integrity of data.

The solution should

work for existing rows imported from legacy system
work across different databases supported by product

We were able to find different flavors of solutions on net but they were incomplete for our case. They only served as starting point to a solution that meets all of our needs mentioned above.

The Final Solution

Add a new column deletion_token
Add unique constraint for combination mobile_number, deletion_token
A new row added to table would have value of ‘NA’ deletion_token. This is ensured by setting up default value of NA at DB level and having constructor of User model(used by ORM) to initialize deletion_token to NA by default
Insert a random UUID for soft deleted
On soft delete of user, generate new UUID and set it on deletion_token

Path to the above solution

Add unique constraint for columns mobile_number, deleted Drawback: This wouldn’t allow us to have more than one deleted user with same mobile number
Add a unique constraint with a where clause eg: ADD CONSTRAINT .... WHERE deleted != 1; Drawback: The where clause in constraint definition is not supported by all databases
Instead of using only 0 or 1 as values for deleted column, increment the number on each delete. Drawback: Expensive as it needs extra db call to retrieve previously soft deleted rows and also expensive to update numbers for existing soft deleted rows in legacy system. It would theoretically fail for concurrent requests without lock.
Add a new time-stamp column called deleted_at and add an unique constraint on mobile_number, deleted_at Drawback: The old rows in legacy system didn’t have data for deleted_at and populating with dummy data wasn’t acceptable.
Add a new column called deletion_token and add a constraint on mobile_number, deletion_token with NULL value for new rows and UUID for soft deleted rows. Drawback: Few databases don’t consider nulls as equal and hence unique constraint does not fail for two rows with same mobile number and NULL value in deletion_token
Slight modification to point 5, to arrive at the final solution described in the beginning of the post

Working software is worth thousand assurances

2015-09-13T19:09:14+05:30

In the beginning of a software project, every one starts with a big list of features. If you ask the product owner, what is the minimum set of features that we can go live with? most of the times you’ll hear a lot more than you expected.

It gets lot harder to talk about minimum scope when you are rewriting an existing software. We would usually want to go live with the minimum viable product(MVP) and build reset of it incrementally. If people are new to agile methodologies, these questions on reducing scope might look stupid & annoying. Fortunately there are ways to get these questions answered, I’m sharing my experience from couple of projects in past few years.

Lets start with a story, where we are writing a new version of the popular website.

Stage 1: Inception / Discovery

Team: According to the stats, features X & Y used only by 5%. 
      Can we deprioritize it for first release?
      We can redirect to old site for users who need them.
Product owners: No! Everything goes live or nothing does.

Start with implementing walking skeleton[1] [2] to validate the approach. After first few iterations showcase,

Stage 2: Walking skeleton is ready

Team: We have the thin slice of end to end user journey. This is how it works.
Product owners: Wow, thats great. Whats left then?
Team: This one does not handle some of these rare scenarios. Lets prioritize what is needed.

Prioritize the backlog to do most important features first. Few weeks before planned release for all features, ask the question again

Stage 3 : Important features are production ready (MVP)

Team: We have everything except feature X & Y. It would take another month for implementing X & Y. 
      Can we go live with users who need X & Y being redirected to old site?
Product owners: Lets go live!! We need to go there before our competitors

Moral of the stor(y/ies):

You can’t get all the answers in the beginning. Prioritize your backlog to do the most important features first. Ask your unanswered question(s) again after each milestone, you’ll be surprised to see how easy it is to get the answers this time.
It is hard for people to understand the benefits of agile methodologies, kanban etc when it is just theory based on your past experiences which they can’t relate to. Build something small & tangible, show the working software to build their confidence.

Python's elegance in dynamic methods compared to ruby

2014-12-27T16:48:17+05:30

I have been a ruby fanboy for a long time because of its expressiveness and elegance in defining DSLs. One of the scary thing in ruby is, when you implement method_missing you need to make sure to implement respond_to_missing?, otherwise bad things will happen to you. The below ruby example shows minimal parts recommended for providing dynamic methods

class Foo
  def method_missing(method_name, *args, &block)
      if method_name.to_s.start_with?('bar_')
          #do_some_thing...
  end

  # You should define this for your good
  # Read http://robots.thoughtbot.com/always-define-respond-to-missing-when-overriding for more details
  def respond_to_missing?(method_name)
      method_name.to_s.start_with?('bar_') || super
  end
end

# So that following will work consistently
foo = Foo.new
foo.bar_qux
foo.bar_qux()
foo.respond_to?(:bar_qux)
foo.method(:bar_qux)

The same can be achieved in python using __getattr__ with lesser code.

class Foo(object):
  def __getattr__(self, attr):
      if attr.startswith('bar_'):
          #return method or an attribute..

# Following will work consistently
foo = Foo()
foo.bar_qux
foo.bar_qux()
hasattr(foo, 'bar_qux')

Eventhough it isn’t a lot of code in ruby, people can forget to implement both methods or implement them differently by mistake leading to tricky bugs and higher maintenance cost.

Can ruby have a similar implementation?

May be not. It is because of the fact that ruby functions are not first class objects which can be returned in a single method_missing hook. Also ruby’s syntax of calling a method without parenthesis(i.e. foo.bar_qux is same as foo.bar_qux()) makes it hard to treat functions as callable objects.

Common mistakes while switching from ruby to python

2014-12-17T13:50:24+05:30

This is more of a note to self sort of post to talk about silly mistakes which can take you down quietly when you switch from ruby to python.

Problem 1 : The method which does nothing

person.rb (Ruby)

class Person
  def initialize(name)
      @name = name
  end
  
  def say_hello
      puts "#{@name} says hello"
  end
end

bob = Person.new('Bob')
bob.say_hello # Bob says hello

person.py (Python)

class Person(object):
  def __init__(self, name):
      self.name = name
  
  def say_hello(self):
      print("%s says hello" % self.name)


bob = Person('Bob')
bob.say_hello # Danger! Danger! Method won't be called

# After breaking your head for a while. Correct it to
bob.say_hello() # Well remember the good old parenthesis to call function?

Problem 2 : The method which returns wrong value

calculator.rb (Ruby)

def add(x, y)
  x + y
end

add(2, 3) # returns 5

calculator.py (Python)

def add(x, y):
  x + y

add(2, 3) # Danger! Danger! Returns nil 

# After breaking your head for a while. Correct it to
def add(x, y):
  return x + y

Solution

Test Driven Development

Using app objects in multi app functional tests

2014-07-27T18:40:04+05:30

It is better to create small modular apps instead of single monolithic application. In Bahmni EMR we have small apps for Patient Registration, Consultation, etc. An end to end functional test covering patient registration and consultation goes through multiple apps. We needed to abstract the concept of an app in test code to increase the readability and maintainability.

The app objects are natural extension of page objects recommended for writing functional tests. An app encapsulates all pages and coarse level actions in the app.

An simple implementation of app class would be

registration/app.rb

class Registration::App
    # Pages in this apps    
    def patient_page
        Registration::PatientPage.new
    end

    def visit_details_page
        Registration::VisitDetailsPage.new
    end

    # Coarse level action in this app
    def register_new_patient(options)
        click_link "Create New"
        patient_page.fill(options[:patient]).start_visit(options[:visit_type])
    end
end

The test using this simple implementation would look like

registration = Registration::App.new
registration.register_new_patient :patient => new_patient, :visit_type => 'OPD'
registration.visit_page.should_be_current_page
registration.visit_page.save_new_patient_visit(visit_info)

clinical = Clinical::App.new
clinical.patient_search_page.should_have_active_patient(new_patient)
clinical.patient_search_page.view_patient(new_patient)

The test code is structured as

apps
    registration
        app.rb
        patient_page.rb
        visit_details_page.rb
    clinical
        app.rb
        patient_search_page.rb
features
    new_patient_visit.rb
framework
    app.rb # Base class for other apps
    page.rb # Base class for other pages

Nicer DSL

We are using capybara and Rspec. The lambda syntax, meta programming constructs in ruby and convention based programming allowed us to implement a DSL shown below. The complete code can be found here

feature "new patient visit" do
    background { login('superuser', 'password') }

    scenario "registration and consultation" do
        new_patient = {:given_name => "Ram", :family_name => 'Singh'}

        go_to_app(:registration) do
            register_new_patient(:patient => new_patient, :visit_type => 'OPD')
            visit_page.should_be_current_page
            visit_page.save_new_patient_visit(visit_info)
        end

        go_to_app(:clinical) do
            patient_search_page.should_have_active_patient(new_patient)
            patient_search_page.view_patient(new_patient)
            # ....
        end
    end
end

I didn’t into details of implementing the DSL. If people are interested, I can write a part 2 of this post.

Make sure functional tests are user centric

2014-07-27T16:43:49+05:30

In modern day web applications, most of the data is fetched and saved using ajax calls and the mark up is generated on client side by compiling static HTML templates. The functional tests need to make sure assertions wait till completion of ajax call.

Couple of bad ways to achieve this is by using sleep(xSeconds) or wait_for_ajax like this

fiil_in 'name', :with => 'foo' # you might be using a page object pattern in real code
click_on 'Save'
wait_for_ajax
expect(find('#number_of_people').text).to be(10)

# The implementation of wait_for_ajax which depends on jQuery.active
def wait_for_ajax
  wait_until { page.evaluate_script('jQuery.active') == 0 }
end

The sleep(xSeconds) makes your tests indeterministic and wait_for_ajax makes it dependent on javascript framework used in application.

Solution 1 : Choose a good framework

Frameworks like capybara have implicit wait mechanism which eliminates the need for wait_for_ajax

click_on 'Save'
expect(find('#number_of_people').to have_content(10) # Implicit wait

But what happens when

You are not using framework like capybara or
The assertion is done on data saved in a database or file system

In these cases, don’t go back to wait_for_ajax or sleep solutions. Instead of depending on technical details of the app, you need to think in terms of

How does the user know app is done loading or saving data?

This will give you a hint on any missing usability requirements in the app.

Solution 2 : Think like a user

When you think like user, you will realize there must be a visual clue in the app to indicate the progress. The functional tests should also depend on this indicator (like a spinner, overlay etc) to figure out when to assert on data.

# click
wait_for_completion
# assert

# Example implementation of wait_for_completion
def wait_for_completion
  wait_until { page.find('#loading-indicator').visible? == false }
end

When your tests user centric, they provide valuable feedback on the user experience.

Keep your models independent of angular

2014-05-18T16:38:01+05:30

If you are using MVC framework you need to make sure controllers are very thin and domain logic lies in small, framework independent, composable models - Wise People

In AngularJS, you need to make sure lot of data is not defined directly on $scope and domain logic is not dependent on angular’s digest cycle. If follow this mantra, unit testing the models would be a lot simpler which in turn is a indicates that your code in is good shape.

Alright lets get to some code. Let’s consider a simple example where we have a form to capture person’s information such as firstName, lastName, age or dateOfBirth. The age or dateOfBirth should be auto populated based on its counter part.

Bad code

//In controller
$scope.firstName = ""
$scope.lastName = ""
$scope.fullName = function() {
  return $scope.firstName + ' ' + $scope.lastName;
}
$scope.age = ""
$scope.dateOfBirth = ""
$scope.$watch('age', setDateOfBirthBasedOnAge);
$scope.$watch('dateOfBirth', setAgeBasedOnDateOfBirth);

If you want to test the logic to compute fullName or age<->dateOfBirth logic, you will have to use angular-mock and inject $scope in your tests. This leads to lot of unnecessary boilerplate code. Lets look at how to refactor this code.

Refactor 1 : Introduce a model

//Model
var Person = function() {
  this.firstName = ""
  this.lastName = ""
  this.fullName = function() {
      return this.firstName + ' ' + this.lastName;
  }
  this.age = ""
  this.dateOfBirth = ""
}

//In controller
$scope.person = new Person();
$scope.$watch('person.age', setDateOfBirthBasedOnAge);
$scope.$watch('person.dateOfBirth', setAgeBasedOnDateOfBirth);

Now you can simply instante a person object and test the fullName method.

Refactor 2 : Remove dependency on $scope.$watch

In this step we will use Object.defineProperty, ES5 API which works on most of the browsers

//Model
var Person = function() {
  ..       //lastName, firstName, fullName code remains same as above
  var _age, _dateOfBirth; // Private fields
  
  Object.defineProperty(this, 'age', {
      get: function() { return _age; }
      set: fubction(value) {
          _age = value;
          setDateOfBirthBasedOnAge();
      }
  });

  Object.defineProperty(this, 'dateOfBirth', {
      get: function() { return _dateOfBirth; }
      set: fubction(value) {
          _dateOfBirth = value;
          setAgeBasedOnDateOfBirth();
      }
  });
}

//In controller
$scope.person = new Person();

After this step, your domain logic can be tested without having to use angular-mock or injectors etc.

Conclusion

One of the boasted feature in AngualrJS is using POJOs for data binding, compared to using special observables or models in knockout, ember etc. If this is one of the reason you are using AngularJS, it is very important to make sure your domain logic doesn’t leak into controllers.

Javascript functions in JSON configuration

2014-05-18T15:20:32+05:30

There is no native support for defining functions in JSON. Commonly used approach is to define function as string and use eval() or new Function() to contruct the function. The basic difference between these two are

eval() works within the current execution scope. It can access or modify local variables.
new Function() runs in a separate scope. It cannot access or modify local variables.

These samples show how the json would differ in these two cases

Using eval()

// JSON config
{
  section: "Additional Details",
  showIf: "function(context) { return context.person.age > 60; }"
}

//This can parsed in javascript as
var showIf = eval('(' + config.showIf + ')')
var shouldShowThisSection = showIf({person: personData}));

Using new Function()

// JSON config
{
  section: "Additional Details",
  showIf: "return context.person.age > 60"
}

//This can parsed in javascript as
var showIf = new Function('context', config.showIf)
var shouldShowThisSection = showIf({person: personData}));

You can use either based on the use case in your application. In bahmni, we went with new Function() for couple of reasons

We did not want config code to modify variables in application execution scope by mistake.
We wanted to control the function signature such as number of parameters and its name to keep it simple and less error prone.

If you prefer using eval syntax, try vkiryukhin/jsonfn.

Multi line functions

The above examples work fine for sinle line expressions. If you need multiple line functions, you need to tweak it a bit. Firstly, JSON does not support multiline string. The work around is to define an array of strings as shown below.

Multi line functions parsed using new Function()

// JSON config
{
  section: "Additional Details",
  showIf: ["if(context.person.gender === 'M' && context.person.age > 60)",
              "return true;",
           "else if(context.person.gender === 'F' && context.person.age > 55)",
              "return true;",
          "else",
              "return false;"]
}

//This can parsed in javascript as
var showIf = new Function('context', config.showIf.join("\n"))
var shouldShowThisSection = showIf({person: personData}));

Performance

There is not much difference in performance when you define a function using function() {} expression eval(‘function() {}’) or new Function(). Have a look at this benchmark using jsperf.

Spinners in single page apps

2014-05-11T16:55:23+05:30

One of the good practices in single page apps, is to show spinners (overlay + loading icon) while retrieving data or saving asynchronously. Often people tend to add a generic interceptor to show spinner for all ajax calls. In any mid size app, you will realize that the assumption “Show an overlay for all ajax call” is incorrect. Few examples

An autocomplete input box fetching reote data
An infininte scroll fetching more data as the use scrolls down the list.

To solve this, one might add an option in interceptor to not show spinner for certain calls. This leads to complicated code due to initial wrong assumption. A better soultion is to have simple reusable code to show/hide spinner and use it explicitly for calls which need spinner.

If you are using a library which returns a promise for ajax call(or object like xhr returned by jQuery.ajax), the API and implementation would look like this.

Api should be simple as

spinner.forPromise($http.get('/items')) //AngularJS

spinner.forPromise($.ajax('/items')) //jQuery

Simple Spinner Implementation in AngularJS

myModule.factory('spinner', function () {
  var forPromise = function(promise) {
      $('#overlay').show();
      promise['finally'](function(){ //use promise.always(hide) in jQuery
          $('#overlay').hide();
      });
  };

  return { forPromise: forPromise }
 )

If you have a multiple components of the page using same spinner, we need to enhance the the code to make sure spinner is hidden only after both components have completed async calls. This can be implemented by keeping spinner count as shown below.

Muliple Call Supported Spinner Implementation in AngularJS

myModule.factory('spinner', function () {
  var showCount = 0;

  var show = function () {
      showCount++;
      $('#overlay').show();
  }

  var hide = function () {
      showCount--;
      if(showCount === 0) {
          $('#overlay').hide();
      }
  }

  var forPromise = function(promise) {
      show();
      promise['finally'](hide); //use promise.always(hide) in jQuery
  };

  return { forPromise: forPromise };
 )

In Bahmni, we use a spinner with animation which can be found here.

Printing external html templates using AngularJS

2014-05-03T20:50:48+05:30

Use case

In Bahmni EMR we needed to support customizable html templates for printing patient regisatrtion card and other printable documents. We needed a print API which looks like

//Contract
printer.print(templateUrl, data)

//This would called on clicking the print button
printer.print('/config/registrationCardTempate.html', {patient: {name: 'Ram Kumar', dateOfBirth: '1978-08-23', gender: 'M'}})

Sample html template

 rel="stylesheet" href="/config/registrationCard.css"/>
 src="/config/logo.jpg">
The hospital name
   ng-bind="patient.name">   ng-bind="patient.dateOfBirth | age">   ng-bind="patient.gender">
Name
Age
Gender

Implementation

As the app is built using angularjs, we decided to use angular as the templating engine for rendering these templates as well. This also helped to reuse filters and other templating features of angular. The implementation consists of following steps

Fetch the html template
Compile the html template with given data using angular’s $complile
Wait for angular to complete rendering the template (Explained here)
Print the html(Explained here)

The code for print function looks like this

var print = function (templateUrl, data) {
    $http.get(templateUrl).success(function(template){
        var printScope = angular.extend($rootScope.$new(), data);
        var element = $compile($('' + template + '
'))(printScope);
        var waitForRenderAndPrint = function() {
            if(printScope.$$phase || $http.pendingRequests.length) {
                $timeout(waitForRenderAndPrint);
            } else {
                printHtml(element.html());
                printScope.$destroy(); // To avoid memory leaks from scope create by $rootScope.$new()
            }
        }
        waitForRenderAndPrint();
    });
};

The complete solution is available on github.

Waiting for AngularJS digest cycle

2014-05-03T17:43:34+05:30

The million dollar question most of the AngularJS users(devs) end up asking is

How to wait for angular digest cycle to be completed? or in simple terms
How to wait for browser to complete rendering the view with angular bindings?

AngularJS does not raise any event to notify this. The suggested simple solution is to use $timeout to queue your work to be run after current digest cycle (also waits for DOM renedering to be completed by the browser).

$timeout(function(){
  //the code which needs to run after dom rendering
})

The above solution works only for views which don’t have ng-include or directives with template url. In this case you have to wait for all the templates to be loaded(async) and then run your code. This can be achived by waiting for $http.pendingRequests to be zero. The enhanced solution is

var waitForRenderAndDoSomething = function() {
  if($http.pendingRequests.length > 0) {
      $timeout(waitForRenderAndDoSomething); // Wait for all templates to be loaded
  } else {
      //the code which needs to run after dom rendering
  }
}
$timeout(waitForRenderAndDoSomething); // Waits for first digest cycle

Hopefully AngularJS will come up with a easier solution in future releases.

Note

The $http.pendingRequests supposed to be used for debugging purpose only. If angular team decides to remove this, you can implement the same using http interceptors as suggested in this link.

Printing html with image and css

2014-05-03T16:54:00+05:30

There are plenty of examples for printing html of an element. Most of them look like this

//The simple soultion but has few problems listed below.
function printHtml(html)
{
    var mywindow = window.open();
    mywindow.document.write('My App');
    mywindow.document.write('');
    mywindow.document.write(html);
    mywindow.document.write('');
    mywindow.print();
    mywindow.close();
    return true;
}

The issues with above solution:

The new window pop up would be blocked by browsers. Users need to enable pop up.
If you have external css files, you will notice the styling not applied some times.
If you have images(like logo), the print will be missing these images intermittently.

The first issue can be addressed by using an iframe instead of new window. The 2nd and 3rd issues are addressed by making sure print happens after page has loaded css files and images. The working solution we use in Bahmni for printing patient registration cards looks like this

var printHtml = function (html) {
    var hiddenFrame = $('').appendTo('body')[0];
    hiddenFrame.contentWindow.printAndRemove = function() {
        hiddenFrame.contentWindow.print();
        $(hiddenFrame).remove();
    };
    var htmlDocument = ""+
                ""+
                    '' + // Print only after document is loaded
                        html +
                    ''+
                "";
    var doc = hiddenFrame.contentWindow.document.open("text/html", "replace");
    doc.write(htmlDocument);
    doc.close();
};

For printing contents of an element you can use this

//element can be found by document.querySelectorAll(slector)[0]
//or using jQuery(selector)[0]
var printElement = function (element) {
  printHtml(element.innerHTML)
};