How to build a great engineering culture

cto_mixer

Several weeks ago I had the opportunity to participate on a CTO panel at Cross Campus in downtown Los Angeles discussing how to build great engineering and developer culture. It was an honor to be on stage with four other accomplished CTO/CIOs including Mike Dunn linkedin icontwitter of Ver, Braxton Woodham linkedin icontwitter of Fandango, Bret Ulrich linkedin icontwitter of Awesomeness TV and Scott Sibelius linkedin icontwitter of Versus Gaming Network. We touched on a lot of great questions, but only scratched the surface on most of them, and in this post I’d like to expand on some those topics.

Question: What defines good engineering culture?

Generally speaking the foundation of a good engineering culture, and good organizational culture is one that at the very least addresses employees’ Maslow’s hierarchy of needs. Maslow’s hierarchy is broken down into five motivational needs: Physiological, Safety, Belonging, Esteem, and Self-actualization. Relating this to your employees, to support their physical needs, employees want to be paid adequately; as part of their psychological needs, they want to feel secure in their job; and to support their needs for self-actualization, they want to have autonomy and ownership of their work product, to name a few examples.

Assuming you have addressed the basic needs of your employees, how do you then, layer a set of values to create appropriate cultural norms on your team? For the teams and organizations that I build, I do this by hiring for, and nurturing the values represented in the DOTEAMS evaluation process. I coined DOTEAMS as a mnemonic to help me conceptualize the values that are important to me and my teams, to use when evaluating people for value fit. It embodies the following:

Diversity of thought and opinion

Ownership of Product

Transparent Communication and Decision Making

Experimentation, Failure, & Learning

Abundance Mindset for your team and customer

Mastery of your domain

Speed of delivery

This mnemonic helps me focus on all of the important values when looking at candidates, as opposed to just using skill as the sole evaluation criteria.  As hiring managers, when we are interviewing, we often tend to focus too much on Mastery as the determination of a candidate’s qualifications; this can be either the mastery of a specific language, or skill. Too little do we focus on evaluating how a candidate will contribute to the whole value system that we are trying to build, and bring into our teams. We spend hours on whiteboard problems and technical challenges, and yet, almost no time on the whole set of organization needs in regards to building great culture. If we focus less on mastery as the sole determining factor of someone’s team worthiness, and instead filter people through a much more holistic value-based heuristic, then we will be far more successful in nurturing the specific culture that we set out to have in our organizations.  Ergo DOTEAMS.

When we filter for mastery during the hiring process, the first place we usually turn to is a potential employee’s Github account. The beauty of hiring engineers in this day and age, is that you can also use Github to get a pretty good idea of how a candidate embodies many of the other potential values. For Ownership, for example, you can look at whether they have a long running side project with a history of improvement; for Experimentation and learning, you may see a graveyard of projects written in a variety of languages. The Abundance Mindset is in many respects demonstrated by participating in and contributing to OSS.

I won’t talk about all of my DOTEAMS values as some of them are, or at least should already be universally accepted as givens of a good culture. There are a few of them that I think warrant additional discussion including Diversity of thought, Experimentation, learning and failure, as well as the Abundance Mindset. I’ll touch on those next.

Diversity of thought and opinion

There was little disagreement on the panel about the need for diversity on teams as a driver of innovation. In my opinion, it has to be one the first values that should be considered as a driver of strong culture.

When discussing our desire to create diverse teams, an audience member posed an excellent question about why we hire for diversity based on “superficial” characteristics such as race, religion, sex, etc. They asked why instead we do not hire people that we think will meet the goal of diverse thought, and ideas, and how we would filter for those qualities.

Unfortunately, we don’t have very good ways of determining diversity of thought, but  superficial qualities are in my opinion some of the most accurate leading indicators of diversity of thought; shaped through that person’s life experience, one which is going to be inherently different than those of white middle-aged bearded technology panelists.

Experimentation, Failure, & Learning

One of my core values is that the best learning comes from constant experimentation and failure. This is a process that is self-reinforcing, where the more that you fail, the more that you learn and can improve upon your failures.

Unfortunately, failure is something that people are not generally taught to be good at, and don’t accept, especially many people in management. I would venture to say that in most organizations, failure is severely discouraged, often resulting in dismissal. In this kind of environment, innovation cannot flourish, and instead gets replaced by a culture of mediocrity that infects the business like a virus.

The alternative is that failure is celebrated as part of the process of learning and development. I did this well at YP, and Spotify documents it well, integrating failure into their process and celebrating it. It starts with failure being treated as a safe activity. In order to make it safe, your people and technology architecture should make it possible to fail safely, which means that your product architecture, as well as your organization architecture, are built up of decoupled systems and self-sufficient teams that share common goals.  With such an architecture, technology and organization-wise alike, the failure of any one system becomes non-fatal, quickly recoverable, and safe.  This promotes experimentation, learning, and ultimately innovation.

Abundance Mindset for your team and customers

Steve Covey coined this term, and it is crucial to good engineering culture in my organizations. In short, the Abundance Mindset describes a mode of operations where we all win when we help each other. This is in contrast to the zero-sum game of a Scarcity Mindset where if I win, then someone loses. When it comes to teams, this pays real dividends in creating an organization that supports all of the other values. A team with an Abundance Mindset helps each other, pairs, coaches, teaches, and lifts the entire organization.

The Abundance Mindset also pays big dividends with your customers. While it’s de-rigueur to say that you have a “customer centric culture”, the Abundance Mindset puts it into action. When we have an Abundance Mindset, our mission is to feel responsible for our customer’s success, as without them, we don’t have a business. When they are successful, then so are we.

 

Question: How do you build a culture from scratch?

When building an organization from scratch, you certainly don’t have the “burden” of existing norms and habits to overcome, and you can just get down to the business of creating the first version of what you think that culture will be. I came up with another useful acronym, the PRO framework for this — that is, People, Rituals, and Organization structure.

People: Hire for values not culture fit

To build a culture, it’s imperative that you first have a good grasp of the values that you want your culture to embody. For me that’s reflected in DOTEAMS; for you, it could be something completely different. You can’t skip this step, however, as organizations and culture start with people, and so to create it, you will need to hire people who you think will be the ambassadors of these values.

In this video from his Techsylvania talk (around the 10:00 mark) https://www.youtube.com/watch?v=yniDni6QfWk Rohan Chandran, one of my previous co-workers from YP, talks about how hiring for culture fit is the opposite of hiring for values, especially with respect to diversity.

To paraphrase him on why you shouldn’t hire people just like you, when you hire for culture fit, you will likely get clones of yourself that will rarely disagree with you, and you will probably enjoy every moment of creating a failed enterprise.

Conversely, when you hire for values, especially when they are those such as in DOTEAMS, then you are more likely to surround yourself with people with diverse opinions, who will make transparent decisions, who will move fast, fail fast, learn from their mistakes, keep you in check, and help you move your business forward. This is likely to create a good amount of healthy conflict, as you may often disagree, and there may be times when your relationship will be strained, but this is also the desired end result to help prevent the kind of group-think that is fatal to a business. It is the melting pot of ideas, opinions, and experiences that enables you to innovate, respond to business risks, and build great products.

Rituals: Process to support your values

Rituals are the tactical processes that you put in place to support your values. For me these are mechanisms like automation tools (test, deployment, marketing, etc) to help with Speed of delivery of your product. Code Reviews, Retrospectives, and Pair Programming support Transparent communication, Experimentation and Learning, and the Abundance mindset values. I am also a big proponent of the Build Measure Learn feedback loop as a ritual used not only in product development, but also when applied to the improvement of culture as well.

At ProductionBeast, for example, we use all of the above, with extensive automated test suites, combined with Continuous Integration and Continuous Deployment to be able to ship features to production with very little drama at the push of a button. We are comfortable with failure, and use it to feed it back into learning and improvement.

Organization structure

Much has been said of the ideal organization structure, with Spotify and Netflix being the canonical examples these days. Martin Abbott and Michael Fisher also go into a great deal of detailed analysis of creating scalable agile teams in their book, “The Art of Scalability.” The book generalizes that the ideal organization structure can be accomplished through the organization of self-sufficient product teams. Braxton on the panel put it best, describing their engineering organization structure as a systems diagram of product teams instead of a functional organization chart. This certainly works well for startups that are already self-sufficient product teams, and scales well provided your architecture is made up of isolated independent products, as your organization structure will be shaped by this.

 

Question: How do you address culture problems when inheriting an organization?

Another question posed was in relation to turn-around work of inheriting an organization with a poor culture — that is, when you can’t hire your way into a DOTEAMS team. The way to address is, is to realize that culture is not a static concept, but an extraordinarily fluid one. It is never good enough, or complete, and will evolve over time. So whether in a brand new organization, or in one you inherited, you should use the same improvement minded rituals to regularly experiment, identify and learn from what is and is not working, so as to improve.

There were a good many other questions posed during the panel, and by the audience that I haven’t touched on.  Get in touch with me here linkedin and let’s keep the conversation going. There is also a lot of good reading on the subject including the aforementioned “The Art of Scalability”Scaling Agile @ Spotify with Tribes, Squads, Chapters and Guilds, and Netflix Culture: Freedom and Responsibility

Finally, a big thank you to Joe Devon linkedin icontwitter and Caroline Rose linkedin icontwitter for putting the panel together, and being stalwarts of the SoCal technology community.

A Review of Monthly Operating Costs for a Startup’s Engineering Infrastructure.

By: Kuba Fietkiewicz

Assuming that you have completed a successful training session at one of the awesome local startup accelerators (Muckerlab, StartEngine, Amplify.la, or Launchpad.la, or you get picked for YC); you have received some immediate traction with 100K users signing up to find out more about your idea; and your idea got funded with $XMM series A; you’ll need to start thinking about the services that you will need to get your product to market, and how much you will be paying for those services in the course of running your business.  Your startup is not only going to need human talent, which will undoubtedly be your largest expense, and some kick-butt office space for you to create your masterpiece (#2 expense), but also the infrastructure and tools to get your product built.  But what kind of infrastructure do you need at this time in your venture, and how much will it cost to get you to Alpha and beyond?

The major components that make up the infrastructure to release your product to customers revolve around content delivery, engineering productivity, and customer relationship management.  Your content delivery platform includes the actual infrastructure that your application will run on, including a chosen Platform-as-a-Service (PaaS), a content delivery network (CDN) for static content, and monitoring to make sure it’s all up and satisfying your users. Engineering productivity tools that are essential in building a digital product include a source repository, continuous integration, defect tracking, testing, project management, and team communication.  Finally, you need infrastructure to help you communicate with your customers.  This may include a customer relationship management (CRM) system with a service desk, and some sort of facility for communicating with your entire customer base.

The following is a brief analysis of some popular vendors providing each of these services, including why they were chosen, and the expected monthly costs based on some base assumptions, broad generalizations, extrapolations where there was not enough data, and downright lies. This is by far not a comprehensive list of tools or vendors, and I encourage constructive discussion on the utility of these, and whether I missed some important ones.

Content Delivery Infrastructure

Assuming that you have a popular web application expecting 1MM page views per month, I picked a fairly generic setup of 4*XL application servers, and 2*XL db servers in multiple availability zones, and a CDN for static content delivery of 5MB per page view.  One can design this kind of system a hundred different ways, using smaller instances and scaling horizontally, or larger ones and scaling both vertically and horizontally, adding caching layers, etc, etc.  This post assumes that we did our due diligence and that these are the right choices under the circumstances.

Application Platform

The platform is where you will deploy your code, it is what will serve your application to your users.  It should be easily scalable, and easy to manage. The choices for major PaaS providers that support the major development stacks include Amazon Elastic Beanstalk which supports Ruby, PHP, Node, .Net, Python, and Java 1; Heroku which supports Ruby, Node, Java, Python, Closure, Scala 2; or Engine Yard which offers a platform that supports applications written in Ruby, Node, and PHP 3.  Amazon is the most flexible, and cheapest solution, as you can either use their PaaS offering (Beanstalk), or you can build your own a-la-carte platform from the complete set of infrastructure services that they offer.  With Beanstalk, Amazon combines the infrastructure services that you need to deploy a scalable app for you, removing the need to do it a-la-carte, and the beauty is that you also only pay for the underlying platforms that you are using 4. In contrast Heroku and EngineYard are platforms built on top of the Amazon stack 5,6, and must therefore charge a premium to make it profitable.  Heroku, though roughly twice the cost of Amazon 7, is from my personal experience the easiest platform to work with, and IMHO is the clear engineer favorite for getting from zero to hero with the least amount of pain.

Winner: Amazon Elastic Beanstalk
Reason: Price & Flexibility
Cost: (based on assumptions)

App Servers: $1,800/mth app servers
Db Servers: $2,000/mth RDS db servers
Total: $3,800/mth 8

Close Second: Heroku
Reason: Ease of use
Cost: (based on assumptions)

App Servers: $3,600/mth
Db Servers: $3,200/mth
Total: $6,800/mth 9

CDN

A CDN offloads the delivery of static content from your app servers, and delivers it to your users from servers that are closest to your users.  This helps reduce costs and improve the performance of your app.  The number of CDN providers and resellers is pretty high, with the most popular being Amazon’s CloudFront, EdgeCast, and LimeLight 10.  Amazon is the clear winner, with big reach, and competitive pricing.

Assumptions: 1MM page views/mth * 5MB/page, US Only servers

Winner: Amazon CloudFront
Reason: Price & Flexibility
Cost: (based on assumptions)

Total: $607/mth 8

Monitoring

The monitoring of the health of your infrastructure is essential to ensure that your users can continue to use your site or buy your widgets.  If you use Amazon services, and especially Elastic Beanstalk, you can take advantage of CloudWatch.  Like the rest of Amazon’s services, not entirely intuitive or easy to use, but at least it’s there.  My personal favorite for monitoring is NewRelic, which gives you pretty decent insight into your application’s ability to serve requests, and send monitoring alerts when things go wrong.

Assumptions: we are monitoring 4 application servers.

Winner: NewRelic
Reason: Ease of use
Cost:

Total: $596/mth 11

Outside of human capital and office space, your platform is the next highest monthly expense.  In total, the platform can easily run you a cool $5,000/month.  It is debatable whether the extra $3,000 expense/month for Heroku is worth it.  As an engineer I appreciate the simplicity of their tools and the time that they save me.

Engineering Productivity Tools

Source repository

Git has won the source revision control battle, with it being offered in the cloud by Github and Atlassian’s Bitbucket.  While it is free for open source projects on both sites12, your super secret sauce will need a place where all of your engineers will collaborate in private.  It’s amazing how easy it is to create new repositories, passing Github’s limit of 20 repositories for $50/month before you know it. Of course you can always upgrade later.  You can alternatively use Bitbucket where the pricing model is much better for private repositories12.

Assumptions: 20 private repositories, 5 engineers

Cost comparison:

Github: $50/mth
Bitbucket: $25/mth

Winner: Github
Reason: Personal preference
Cost:

Total: $50/mth

Continuous Integration

The choices for continuous integration have typically been to build your own instance of Hudson or Jenkins in house.  While many still do, companies like CircleCI have sprung up, claiming that they remove the 8% of a team’s engineering time that it takes to maintain your own CI server 13 .  Given that engineer time is the most costly part of your startup, this is not an insignificant savings, and a worthwhile thing to outsource.  The companies in this space include CircleCI, Codeship, and Solano Labs.

Cost comparison:

CircleCI: $49/mth –10 projects, unlimited builds, unlimited build time, 1 container 13
Codeship: $50/mth – 10 projects 1,000 builds, unlimited build time, 2 concurrent builds 14
Solano: $50/mth  – 4 workers, 40 worker hours 47

Winner: CircleCI
Reason: Value for money

Total: $49/mth

Close Second: Codeship
Reason: Tight integration

Total: $50/mth

Defect Tracking

Defect tracking and resolution is an essential as part of your development process.  There are a number of options in this space, from basic open source projects such as Bugzilla or Mantis which you can either run yourself or have hosted, to the “Cadillac” solutions from Atlassian’s Jira and FogCreek’s FogBugz which also provide project management utilities15,16.  Even Github is in this space with Github issues, however reporting isn’t quite there.

Assumptions: 10 person team

Cost comparison:

Github issues: free with paid github account 17
Hosted Mantis: $5/mth 18
Hosted Bugzilla: $15/mth 19
Hosted Jira: $50/mth (all features) 20
FogBugz: $300/mth 21

Winner: Jira
Reason: Full featured, nice integration with other tools, excellent reporting
Cost:

Total: $50/mth

Testing

In a small group with a limited budget everyone should undertake the role of functional tester at all times.  You should also ultimately eat your own dog food, and through regular use of your app be able to know where it fails on a regular basis.  If you lack the time to do this, or feel the need for a more diverse testing group, services such as uTest and 99tests offer functional as well as usability testing services 22.  Though they claim to remove the need for at least 1 testing resource, the cost is pretty steep at $499/testing cycle, or if you use them for every sprint, you may be looking at a hefty bill if you have weekly sprints.

Winner: Do it yourself
Reason: Eat your own dog food
Cost:

Your hourly rate

Distant second: uTest
Cost:

$1,996/mth

Project Management

Project management tools allow you to define your project deliverables, track their status, and ultimately gives you insight into when features will be delivered and how much they will cost.  A number of different tools exist in this space.  While you can get away with not using cloud solutions for this, and use MS Project, or even just Excel, one gets a lot of value out of cloud-based solutions, especially when you work on a decentralized distributed team.  When you have remote engineers, or use staff supplementation, the ability to share, manage, and communicate project requirements and status from one place that everyone has access to is essential.

Assuming that you use some sort of Agile/Scrum process variant, you have many options.  FogCreek has the free 23 Trello, there’s Pivotal Tracker24, Atlassian has Jira Agile25, and there are also Sprint.ly26 among others.  They are all great, with the lesson being that if you’re not using one of these, your startup is probably going to fail.

Winner: Trello
Reason: Free, great mobile apps
Cost:

Total: Free 23

Close Second: Pivotal Tracker
Reason: Excellent interface
Cost:

$50/mth27

Close Third: Sprint.ly
Reason: Excellent interface
Cost:

Total: $140/mth (team of 10)28

Communication

Speaking of decentralized teams and real-time asynchronous communication: you need it.  Email comes close, but is not real-time enough.  IRC is better at real-time, but not the best interface.  A great tool will act like IRC, will email you if you want, will allow you to post images or link to your source repository, or your CI, and will allow people to send you messages after you log off so that they will be there when you come back.  The offerings in this space are Hipchat from Atlassian 29, and Campfire30 from 37signals.

Cost comparison:

Campfire – $12/mth – 12 people, 1GB storage 31
Hipchat – $10/mth – 10 people 32

Winner: Hipchat
Reason: More features33
Cost:

Total: $10/mth

The total for all of these chosen engineering tools is only $160/month, not including testing.  This is incredibly cheap, and money well spent to keep your code organized, your communication flowing and your project on track.  You could spend as much as $500/month, though I’m not sure you would get that much more for the money.

Customer Relationship

CRM

Your customer relationship system should at the very least give you one view of all your interactions with a customer.  This gets complicated pretty fast when you have 100K+ customers so a good tool for this essential.  The basic customer CRMs include Zendesk34, and Insightly35, with SugarCRM36 as one of the more widely used solutions37.

Cost comparison:

Insightly – $50/15users/mth/ unlimited contacts 38
Zendesk Pro – $59/user/mth 39
Sugar Enterprise – $60/user/mth – many many features 40

Winner: SugarCRM
Reason: Features
Cost:

Total: $60/mth

Email Communication

One of the most overlooked, and highest value re-engagement tools is your e-mail list.  Expect to send 4-10 emails to your users each month, depending on the relationship that you have with your users.  The major providers helping you manage that communication include Constant Contact41, MailChimp42, and Vertical Response43.

Cost comparison:

MailChimp: $379/mth/100K users 44
Vertical Response: $600/mth/100K users *Estimated as website pricing only up to 40,000 users45
Constant Contact: $1,200/mth/100K users* Estimated as website pricing only up to 2500 users46

Winner: MailChimp
Reason: Cost
Cost:

Total: $379/mth

The total for the best tools to work with your customers will cost you at least $440/month, and can run upwards of $1,200 for the most expensive list management tools.

Outside of engineering staff and office space, when you add up the platform, the engineering productivity tools, and the customer relationship tools, given the same choices your startup costs can be in the ballpark of $6,000/month, and can very easily reach $10,000 or beyond depending on your preferences of one toolset over another.  The bulk of this expense will be on the platform, then customer management, and finally tools to aid in your product’s development.  Because of this, optimization and validation of your platform spend seems like a worthwhile exercise for your fledgeling business.

Resources

1. http://aws.amazon.com/elasticbeanstalk/
2. https://www.heroku.com/features
3. https://support.cloud.engineyard.com/entries/21009842-engine-yard-technology-stack
4. http://aws.amazon.com/elasticbeanstalk/
5. http://www.aws-partner-directory.com/PartnerDirectory/PartnerDetail?id=1722
6. https://www.engineyard.com/products/technology
7. http://stackoverflow.com/questions/9802259/why-do-people-use-heroku-when-aws-is-present-whats-distinguishing-about-heroku
8. http://calculator.s3.amazonaws.com/calc5.html
9. https://www.heroku.com/pricing
10. http://blog.streamingmedia.com/2012/08/updated-list-of-vendors-in-the-content-delivery-and-transparent-caching-markets.html
11. http://newrelic.com/pricing
12. http://www.infoworld.com/d/application-development/bitbucket-vs-github-which-project-host-has-the-most-227061?page=0,1
13. https://circleci.com/pricing
14. https://www.codeship.io/#pricing
15. https://www.atlassian.com/software/jira
16. http://www.fogcreek.com/fogbugz/features/
17. https://github.com/plans
18. http://www.a2hosting.com/mantis-hosting
19. http://devzing.com/index.php
20. https://www.atlassian.com/software/jira/pricing?tab=ondemand
21. http://www.fogcreek.com/fogbugz/Pricing.html
22. http://www.benchmarkqa.com/wp-content/uploads/2012/04/Crowdsourced-Testing-Companies.pdf
23. https://trello.com/
24. http://www.pivotaltracker.com/
25. https://www.atlassian.com/software/jira-agile/overview
26. https://sprint.ly/
27. http://www.pivotaltracker.com/pricing
28. https://sprint.ly/#pricing
29. https://www.hipchat.com/
30. https://www.campfirenow.com/
31. https://www.campfirenow.com/signup
32. https://www.hipchat.com/pricing
33. https://www.hipchat.com/compare
34. http://www.zendesk.com/product/key-features
35. http://www.insightly.com/features
36. http://www.sugarcrm.com/
37. http://www.staff.com/blog/comparison-of-crm-software/
38. https://insightly.com/pricing
39. http://www.zendesk.com/product/pricing
40. http://d2owqhhe2x3j50.cloudfront.net/media.sugarcrm.com/datasheets/7/EditionsComparison_7_04-13-01-LR.pdf
41. http://www.constantcontact.com
42. http://mailchimp.com
43. http://www.verticalresponse.com
44. http://mailchimp.com/pricing/
45. http://www.verticalresponse.com/pricing
46. http://www.constantcontact.com/email-marketing/pricing
47. https://www.solanolabs.com/product

Scaling Rails on Heroku

By: Kuba Fietkiewicz

People choose Ruby on Rails as the framework of choice at their startup for a variety of reasons, not the least of which is the joy that it brings back to programming. Putting aside the arguments that Rails does not scale and that you should avoid it if you are building a high traffic app, at YP which is a top 30 Internet property, we have been using Rails for our flagship site for at least 5 years.

Engineers choose Heroku for many of the same reasons as they do Ruby. Heroku is every engineer’s favorite deployment platform, because it makes deployment of your app dead simple, removing much of the thinking required in setting up a scalable infrastructure.  Yet, thinking is still required, as no platform, not even one as well engineered as Heroku can overcome poor decision-making.

Given a hypothetical that we have made the decision to deploy Rails on Heroku, how would one go about scaling a typical app there?  Since applications are complex and varied in their requirements, everyone’s favorite and unhelpful answer is “it depends.”  The fact is, that you need to first understand where the problem is, before you can begin to address it.  Ultimately, you will need to do a little bit of sleuthing before you can determine where you will need to spend your time and money optimizing your infrastructure for scale.

Despite this, there are certainly a number of high-level architectural decisions that you can make immediately, which will make it easier for you to hone in on problems and optimize your infrastructure.

A typical app has static assets such as images, style sheets, and videos; dynamic application code that accesses and acts on data; and the data itself in some data store. Each of these exerts different demands on your application and thus should be considered independently of the others. How big are the static assets?  Do the videos need to stream? How do we prevent buffering?  Does the application code have long running algorithms? Is the data constantly changing?  What is the ratio of reads to writes?  Is it transactional data or reporting data?  These are just some of the questions to consider.

Approaching scaling problems can best be addressed with an application architecture that supports the scaling of each architectural layer independently of the others through the separation of concerns.

How is this achieved?  Static assets should be deployed to infrastructure optimized to deliver those as fast as possible, namely Content Delivery Networks (CDNs). CDNs fulfill requests for assets from servers closest (or best suited) to the requesting client so as to reduce the latency from network congestion.  This offloads work from your dynamic application, which can instead focus on responding to user requests to act on data and building the response for presentation.  Finally, data and business logic access should be encapsulated behind an http service layer, exposing a slowly changing contract to the front-end application, and responsible for providing the fastest possible access to business objects, regardless of back-end implementation.

The Setup

For the example app, I will make some initial assumptions so as to create a baseline benchmark. I will then move one lever at a time to show how it can affect application performance.  The app is very simple.  It has a /users/bench endpoint that performs a select for a random user on a table with 1MM rows, then displays that user’s information.

The App

$ rails new rails-bench --skip-test-unit --database=postgresql
$ cd rails-bench
$ rails g scaffold User name:string email:string username:string

config/database.yml

defaults: &defaults
 adapter: postgresql
 encoding: unicode
 pool: 5
 host: localhost
 port: 5432

config/routes.rb

get 'users/bench' => 'users#bench'

app/controllers/users_controller.rb

def bench
 @user = User.find_by(username: "user#{rand(0..1000000)}")
 render :show
end

Gemfile

gem "rails_12factor", group: :production
ruby "2.0.0"

Initial Heroku Setup

$ git init && git add . && git commit -m "init"
$ heroku create
Creating calm-age-1887... done, stack is cedar
http://calm-age-1887.herokuapp.com/ | git@heroku.com:calm-age-1887.git
Git remote heroku added

Database Setup

Given that Postgres is the db of choice on the Heroku platform, that’s what we’ll use.  The standard Dev database is inadequate for our test which will contain more than the allotted 10K rows in this tier:

$ heroku pg:info
 === HEROKU_POSTGRESQL_PINK_URL (DATABASE_URL)
 Plan:        Dev
 Status:      available
 Connections: 2
 PG Version:  9.3.1
 Created:     2013-11-30 23:20 UTC
 Data Size:   6.4 MB
 Tables:      0
 Rows:        0/10000 (In compliance)
 Fork/Follow: Unsupported
 Rollback:    Unsupported

So let’s upgrade it:

 $ heroku addons:add heroku-postgresql:standard-yanari
 Adding heroku-postgresql:standard-yanari on calm-age-1887... done, v6 ($50/mo)
 Attached as HEROKU_POSTGRESQL_NAVY_URL
 The database should be available in 3-5 minutes.
 ! The database will be empty. If upgrading, you can transfer
 ! data from another database with pgbackups:restore.
 Use `heroku pg:wait` to track status..
 Use `heroku addons:docs heroku-postgresql` to view documentation.

 $ heroku pg:promote HEROKU_POSTGRESQL_NAVY_URL
 Promoting HEROKU_POSTGRESQL_NAVY_URL to DATABASE_URL... done

Dataset

For the following set of tests we’ll use a dataset with 1MM users.  In order to seed the database with these users we’ll use the activerecord-import gem, and the following seeds script.  This reduces the time to load the dataset from 33min/MM down to 3min/MM.

Gemfile

gem "activerecord-import"

db/seeds.rb

save_slice = 100000
(0..1000000).each do |index|
 users << [
   "first name, lastname #{index}", 
   "name#{index}@mailinator.com", "user#{index}"
 ]
 if(index%save_slice==0)
   User.import columns, users, options
   users = []
 end
end

Then we redeploy and migrate the database, and seed our data:

 $ git commit –am “added data import” && git push heroku master
 $ heroku run rake db:migrate
 Running `rake db:migrate` attached to terminal... up, run.9316
 ==  CreateUsers: migrating ====================================================
 -- create_table(:users)
 -> 0.0193s
 ==  CreateUsers: migrated (0.0227s) ===========================================

 $ heroku run rake db:seed
 Running `rake db:seed` attached to terminal... up, run.1153
 …
 

Benchmarks

For most of the benchmarks we will be using two tools: ApacheBench, Version 2.3 Revision: 1528965, and Jmeter v.2.10. For each test we will set the number of requests at 1000 with a concurrency at 100.  I will be using Jmeter for the purpose of gathering performance data on the application as a whole, as ApacheBench does not request any related/embedded static assets. [1][2]

Pulling The Levers: Tuning The Database

Impact: High

Tuning your database likely has the highest impact on the performance of your application, and it’s ability to scale.

Database Sizing

Sizing your database is an important consideration for the performance of your app.  Heroku recommends choosing a plan where your entire data set can fit into the Postgres in-memory cache as data served from cache can be 100-1000X faster than from disk.  [3]

Looking at our oversimplified 1MM user data in the database, we see that it only takes up 140MB of space and that it should easily fit into the Standard Yanari 400MB cache. [4]

 $ heroku pg:info
 === HEROKU_POSTGRESQL_NAVY_URL (DATABASE_URL)
 Plan:        Standard Yanari
 Status:      Available
 Data Size:   139.5 MB

Looking at the query that determines the cache hit ratio we see that 99% of data resides in the cache and so we have optimized our instance size to fit all of our data in memory:

 $ heroku pg:psql
 => SELECT
 ->   sum(heap_blks_read) as heap_read,
 ->   sum(heap_blks_hit)  as heap_hit,
 ->   sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read ->   )) as ratio
 -> FROM
 ->   pg_statio_user_tables;
 heap_read | heap_hit |         ratio
 -----------+----------+------------------------
 133361 | 15432845 | 0.99143265867096966338

Our bench tests however do not show adequate performance even with the entire data set in cache:

 $ab -n 1000 -c 100 http://host/users/bench
 Requests per second:    7.15 [#/sec] (mean)
 Time per request:       13987.977 [ms] (mean)

 $java -jar ApacheJMeter.jar -n -t Rails\ Bench\ Plan.jmx
 1000 in 141.2s = 7.1/s Avg: 13602 Min:  2708 Max: 22793

Data Indexes

The reason that our benchmark does not perform well even with the entire data set in cache, is that our test retrieves random rows from our entire data set.  This query plan shows that the entire users table must be scanned to find the correct row:

 $ heroku pg:pgsql
 => EXPLAIN ANALYZE SELECT * from users where username='user1234';
 QUERY PLAN
 --------------------------------------------------------------------------------------------------------
 Seq Scan on users  (cost=0.00..17777.00 rows=1 width=82) (actual time=261.886..370.853 rows=1 loops=1)
 Filter: ((username)::text = 'user1234'::text)
 Rows Removed by Filter: 1000000
 Total runtime: 370.989 ms
 (4 rows)

Simply by adding an index to the right column on the database can drastically improve the performance of your queries:

$ rails g migration AddIndexToUser

db/migrate/<date>_add_index_to_user.rb

class AddIndexToUser < ActiveRecord::Migration
  def change
    add_index :users, :username
  end
end

$ git add . && git commit –m “add index to username” 
$ git push heroku master
$ heroku run rake db:migrate
Running `rake db:migrate` attached to terminal... up, run.7636
Migrating to AddIndexToUser (20131201014140)
==  AddIndexToUser: migrating =================================================
-- add_index(:users, :username)
-> 26.3659s
==  AddIndexToUser: migrated (26.3661s) =======================================

Now if we run our query plan:

 => EXPLAIN ANALYZE SELECT * from users where username='user1';
 QUERY PLAN
 --------------------------------------------------------------------------------------------------------------------------------
 Index Scan using index_users_on_username on users  (cost=0.08..4.0 9 rows=1 width=82) (actual time=0.142..0.143 rows=1 loops=1)
 Index Cond: ((username)::text = 'user1'::text)
 Total runtime: 0.210 ms
 (3 rows)

We see an appropriate 1000X improvement in data access times.

And if we run our benchmarks we see a less dramatic, but still significant 13X improvement in request times for the ApacheBench benchmark, and a 7X improvement in the Jmeter benchmark:

$ ab -n 1000 -c 100 http://host/users/bench
 Requests per second:    89.04 [#/sec] (mean)
 Time per request:       1123.054 [ms] (mean)

 $ java -jar ApacheJMeter.jar -n -t Rails\ Bench\ Plan.jmx
 Results = 1000 in  20.9s = 47.9/s Avg: 1556 Min: 372 Max:  2924

The reason that you begin to see a difference in throughput between the ApacheBench and Jmeter benchmarks is that Jmeter is requesting the entire set of web content including the 2 other embedded files:

/assets/application.js:104kb
/assets/application.css:1kb

Previously this effect was not visible, as the request throughput was constrained by the database response times.

More Advanced Considerations (Not Implemented/Benchmarked)

Replication

One way to achieve increased read throughput on your data is via a master-slave replication setup.  This can be achieved on Heroku through the use of follower databases.  Follower databases are read-only copies of your master database, which are updated in near-real-time from transactions happening on the master. [5]

Creating a follower database is as easy as provisioning a new database with the –follow flag which points to your current master:

$ heroku addons:add heroku-postgresql:ronin --follow MASTER_DB_URL

Followers are one way to have a hot standby ready for manual failover of your database.  Manual failover can be achieved simply by unfollowing your follower, and promoting it as your master:

$ heroku pg:unfollow <FOLLOWER_DB_URL>
$ heroku pg:promote <FOLLOWER_DB_URL>

Substituting <MASTER_DB_URL> and < FOLLOWER_DB_URL > with the names of your master and follower respectively.

HA Replication

High availability replication is available on all Premium and Enterprise plans, and is generally transparent to the application owner.  As it requires no owner input, other than purchasing the right plan, if this is something that is important to you, then make sure to purchase a premium plan and read about it here: [6]

https://devcenter.heroku.com/articles/heroku-postgres-ha

N+1 Queries

The N+1 problem comes from using naive ORM constructs to get at data within an association.  Using the example from Rails Guide on Active Record Querying [7]:

clients = Client.limit(10)
clients.each do |client| 
  puts client.address.postcode 
end

The above code executes 1 (to find 10 clients) + 10 (one per each client to load the address) = 11 queries in total. If we eagerly load the association using the includes() method, we can change the code to run only 2 queries instead of the original 11:

clients = Client.includes(:address).limit(10)
  clients.each do |client|
  puts client.address.postcode
end

This results in only 2 queries being run:

SELECT * FROM clients LIMIT 10
 SELECT addresses.* FROM addresses
 WHERE (addresses.client_id IN (1,2,3,4,5,6,7,8,9,10))

Database Conclusion

The majority of apps are constrained by their slowest component, their database.  Clearly tuning your database size, indexing the right columns, and optimizing your queries will have significant impact on the scalability and performance of your entire system.  This is the first place where a significant amount of effort should be invested.

Pulling The Levers: Static Assets

Impact: High

Heroku Static Asset Serving

One can enable the serving of static assets from a Rails app in Heroku by using the rails_12factor gem.

gem 'rails_12factor', group: :production

This isn’t enough. While in other environments, a reverse proxy such as nginx can and should be used to intercept and serve static content rather than sending those requests to the app layer, the Heroku routing mechanism precludes the need for nginx, and thus removes the ability for static content to be served in this way. [8] When you use Heroku, all asset requests get sent to the app layer to be served by your dynos. The idea is to use the right tool for the job, and this is not it.

The previous Jmeter test of 100 concurrent threads looped 10 times with a 5 second ramp up, and limit of 4 concurrent embedded asset requests shows that our throughput is quite low 47.9 pages/sec:

$ java -jar ApacheJMeter.jar -n -t Rails\ Bench\ Plan.jmx
 Results = 1000 in  20.9s = 47.9/s Avg: 1556 Min: 372 Max: 2924

Static Assets Served From CDN

The best place to serve your static assets from is a CDN.  Using a Content Delivery Network optimizes the delivery of static assets on your site. This allows us to offload all requests for these static assets off of your web dynos, which in turn will free those dynos to handle more requests for dynamic content. [9]

Changing our production.rb environment file to use cloudfront as the assets host:

config/environments/production.rb

config.action_controller.asset_host="http://d65asdf.cloudfront.net"

After redeploying, our same Jmeter test shows a dramatic improvement, now matching our ApacheBench test:

Results = 1000 in  10.0s = 99.9/s Avg: 525 Min: 106 Max:  1064

Static Assets Conclusion

The test clearly shows that offloading static assets to a CDN has significant positive impact on the performance of your application.  This would be far more pronounced if the static assets were of the number and size representative of today’s sites.  When your application is not busy serving static assets, it is freed up to be able to serve the maximum possible number of requests.

Pulling The Levers: Application Server

Heroku recommends the running of Rails on multi process application servers such as Unicorn. [10] The theory being, that the application server should take advantage of all available CPU cores. Given that a 1X dyno only has 1 core [11], this number of workers should not exceed 2 or 3.

Unicorn recommends setting the number of worker processes to at least the number of cores, but not much more than that.  More can be set to overcome some inefficiencies caused by slow non-blocking requests. Additionally, the number of workers running Rails should not exceed the amount of memory available on the machine. [12] Michael VanRooijen has previously shown the serious negative effect of too many unicorn workers on a dyno. [13]

Following Heroku’s Unicorn configuration advice, we add the Unicorn gem to the Gemfile, a Unicorn configuration, a Procfile, and redeploy:

Gemfile

gem ‘unicorn’

config/unicorn.rb:

 worker_processes 2 # amount of unicorn workers to spin up
 timeout 30         # restarts workers that hang for 30 seconds

 preload_app true

 before_fork do |server, worker|
   Signal.trap 'TERM' do
     puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
     Process.kill 'QUIT', Process.pid
   end
   defined?(ActiveRecord::Base) and
     ActiveRecord::Base.connection.disconnect!
 end

 after_fork do |server, worker|
   Signal.trap 'TERM' do
     puts 'Unicorn worker intercepting TERM and doing nothing. Wait for master to send QUIT'
   end
   if defined?(ActiveRecord::Base)
     config = Rails.application.config.database_configuration[
       Rails.env
     ]
     config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 
     config['pool']            = ENV['DB_POOL'] || 5
     ActiveRecord::Base.establish_connection(config)
   end
 end

Procfile

 web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb

With 2 workers our apache bench test shows an expected ~2X improvement:

 Requests per second:    208.93 [#/sec] (mean)
 Time per request:       478.622 [ms] (mean)

And our Jmeter test also shows significant gains in throughput, though not as pronounced:

Results =  1000 in 7.3s = 137.6/s Avg: 241 Min: 108 Max: 1053

Dyno Size

The standard Dyno size has 512MB of memory and 1cpu core. [11]  With this sizing, it shouldn’t make much sense to increase the number of workers above 2 or 3, especially if that number of workers stretches the Dyno’s memory limits.  In order to understand the memory utilization of your app it is possible to add the log-runtime-metrics add on, which can provide insight into your apps memory and CPU utilization metrics by injecting those into the log stream. [14]

 $ heroku labs:enable log-runtime-metrics
 $ heroku restart

The resulting log entries show that our app uses up a total of 136MB of memory with 3 workers.  Depending on the demands of the app, you may be able to get away with more worker processes, however given that we’re constrained by 1 CPU, this may not provide many dividends.

If we double the size of the Dyno, do we see a similar increase in throughput?

 $ heroku ps:resize web=2x

The ApacheBench benchmark improves slightly due to this change:

 Requests per second:    271.86 [#/sec] (mean)
 Time per request:       367.834 [ms] (mean)

However the Jmeter benchmark does not:

 Results = 1000 in 8.1s = 123.7/s Avg: 282 Min: 103 Max: 1278

It appears that this simple app is not bound by CPU. Increasing to 5 workers on a double-sized Dyno, the benchmarks show neither an increase in speed, nor an increase in throughput:

 Requests per second:    274.72 [#/sec] (mean)
 Time per request:       364.008 [ms] (mean) 
 Results =  1000 in  7.2s = 139.1/s Avg: 216 Min: 100 Max: 1179

If we instead double the number of Dynos with 2 workers on each, will we see a corresponding increase in throughput?

 $ heroku ps:resize web=1x
 $ heroku ps:scale web=2

Not a significant change from our 1 Dyno with 2 workers, even with a slight degradation, and a significant degradation from the double sized Dyno:

 Requests per second:    198.71 [#/sec] (mean)
 Time per request:       503.255 [ms] (mean)
 Results =  1000 in 8.2s = 122.6/s Avg: 251 Min: 106 Max: 1002

Application Server Conclusion

Tuning your application server requires a deep understanding of the characteristics of your application.  If your application’s behavior is that of short lived requests, then using and tuning an application server such as Unicorn can be very beneficial.  Under those circumstances, we were able to get a peak output of 274 requests/sec with 5 workers on a double sized Dyno.

Pulling The Levers: SOA Back End

Impact: High, Potentially Negative

When isolating your concerns, it is often advantageous to remove data access from the application and move it behind highly performing http endpoints.  This pattern has several advantages.  It protects your application from often changing back end implementation details, it offloads some of the work to other applications, and it allows for different scaling decisions to be made based on each isolated component.

Setup

To setup this scenario, we added a JSON endpoint /users/username.json?id=… to our app connected to our data store, and we created a new application that uses ActiveResource to call that endpoint passing it our random username.

New App

 $ rails new rails-bench-soa --skip-test-unit

Gemfile

 gem ‘activeresource’

config/routes.rb

 get 'users/bench' => 'users#bench'

app/controllers/users_controller.rb

 def bench
 @user = User.find(
    :first, 
    :from => :username, 
    :params => {:id => "user#{rand(0..1000000)}"}
  )
 render :show
 end

app/models/user.rb

 class User < ActiveResource::Base
   self.site = "http://apihost"
 end

The other files are the same as the previous app:

 config/environments/production.rb
 config/unicorn.rb
 Procfile
 $ git init && git add . && git commit –m “initial app”
 $ heroku create
 $ git push heroku master

Make sure to change your cloudfront origin server configuration to pull from your new Heroku domain.

Old App Changes

config/routes.rb

get 'users/username' => 'users#username'

app/controllers/users_controller.rb

 def username
   username = [User.find_by(username: params[:id])]
   render json: username
 end

Setting the Baseline

Running our apache benchmark against our new SOA endpoint, we see that Rails contributes sub millisecond overhead and that we continue to be constrained by the database:

 Requests per second:    122.88 [#/sec] (mean)
 Time per request:       813.782 [ms] (mean)
 Completed 200 OK in 6ms (Views: 0.4ms | ActiveRecord: 4.4ms)
 Completed 200 OK in 4ms (Views: 0.6ms | ActiveRecord: 2.5ms)
 Completed 200 OK in 7ms (Views: 0.4ms | ActiveRecord: 6.0ms)

Testing the SOA Implementation

Now that we add network overhead, by running our benchmarks against our new app which uses the service endpoint instead of a direct database connection we see a significant degradation in performance:

 Requests per second:    42.33 [#/sec] (mean)
 Time per request:       2362.617 [ms] (mean)

With similar results for Jmeter

Results = 1000 in  25.0s = 40.0/s Avg: 1896 Min: 134 Max: 4314

With much of our app waiting for the network, we should be able to take advantage of more requesting Rails processes, so let’s double up on our web scale:

 $ heroku ps:scale web=2

While our benchmark’s throughput does not double, it does improve significantly:

 Requests per second:    74.60 [#/sec] (mean)
 Time per request:       1340.528 [ms] (mean)
 Results = 1000 in  15.6s = 64.2/s Avg: 962 Min: 129 Max: 4198

Increasing the number of dynos to 4, results in additional gains:

 Requests per second:    121.59 [#/sec] (mean)
 Time per request:       822.410 [ms] (mean)

SOA Conclusion

These benchmarks show that there is a very high overhead cost in implementing a service back end, one that will never be as fast as direct database access.  The cost may be the lesser of two evils, however, when you reach a scale that exceeds the capabilities of the database.  At that time the refactoring of your service back end can be made transparent to your front end, as you have provided a stable http interface into that layer.  In the meantime, additional throughput can be had by horizontally scaling your application stack.

Though not illustrated in the above benchmarks, using Rails for your SOA back end can be like hitching an Airstream to your drag racer.  When the solution calls for speed, bringing the comforts of home may not be the wisest choice in that circumstance.  Although in this limited test we didn’t see a significant impact from Rails, rather one from the network, under heavy sustained load I would expect the performance of the service stack to suffer. Using lighter weight frameworks such as Sinatra can be an improvement, though by some benchmarks, not enough of one, and not always. [15]  In my experience this has been a successful next step. At YP we moved many of our back end services to Sinatra and in general received a 10X speed improvement in those cases.

Summary

This is by far NOT a comprehensive guide to scaling your application on Heroku.  A number of important topics were not covered including caching, monitoring, logging, offline processing, long queries, asynchronous processing, and the list goes on. The goal of this post is to provide you with the minimal amount of information that you need to begin the investigation into how to scale your app on Heroku.

What was shown is that tuning your database is the number one thing that you need to approach when building your application.  This includes understanding the size of your dataset, understanding how your application queries your data, and understanding the most efficient ways to do so.

Moving static assets off your application server is essential to increasing the ability of your application server to be able to serve meaningful requests, and essential from a user experience standpoint.

Tuning your application server depends on the characteristics of your application and should be done with that understanding, and with the understanding of the limitations of the application server.

Implementing SOA may be counterproductive in the beginning, and will negatively affect the performance of your application. It will however be necessary at some point in the growth of your app as you look to find other ways to scale beyond the capabilities of your database.

This is just the tip of the iceberg of strategies that you will need to apply in your ongoing search to extract the maximum performance from your application. As you do this, you will need to move one lever at a time, and continuously benchmark your changes.

References

[1] http://httpd.apache.org/docs/2.2/programs/ab.html
[2] http://jmeter.apache.org
[3] https://devcenter.heroku.com/articles/heroku-postgres-plans#cache-size
[4] https://devcenter.heroku.com/articles/heroku-postgres-plans#standard-tier
[5] https://devcenter.heroku.com/articles/heroku-postgres-follower-databases
[6] https://devcenter.heroku.com/articles/heroku-postgres-ha
[7] http://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations
[8] https://devcenter.heroku.com/articles/http-routing
[9] https://devcenter.heroku.com/articles/using-amazon-cloudfront-cdn-with-rails
[10] https://devcenter.heroku.com/articles/rails-unicorn
[11] https://devcenter.heroku.com/articles/dyno-size
[12] http://unicorn.bogomips.org/TUNING.html
[13] http://michaelvanrooijen.com/articles/2011/06/01-more-concurrency-on-a-single-heroku-dyno-with-the-new-celadon-cedar-stack/
[14] https://devcenter.heroku.com/articles/log-runtime-metrics
[15] http://www.techempower.com/benchmarks/#section=data-r7

Vagrant speed up on Mac OS X

A while ago after struggling for weeks with trying to make all of the different gems configurations work on our Macs, our team switched to using Vagrant for development, matching our CentOS production deployment environment.

Our team except for me.  It was dead slow, and I refused to be convinced that this is better than the wrestling exercise of getting the oci gems working (see previous posts).  Once OCI is installed, things mostly work, and at least it’s fast enough that your test suite gives you feedback quickly enough that it’s not a chore to run.

This was until I had to install the LWES gem. It simply doesn’t work on Mac OS X.  Game over, give up, switch to Vagrant, and cry.

6 hours it took to red-green-refactor a simple API library, because of the many minutes it took for each spec to run.

Fortunately, a simple config change makes all the difference:

config.vm.share_folder "lui", "/home/vagrant/project_name", ".", :nfs => true
config.vm.network :hostonly, "192.168.255.2"

Voila! A seemingly 10X improvement in performance. Apparently switching to a :hostonly network with :nfs is the magic trick to making this a totally workable environment.

Unfortunately it does not work on Windows for those of you living in that world.

Gotchas when using AWS RDS command line tools

Say you installed the rds command line tools and wanted to create a db instance:


rds-create-db-instance my_dbinstance \
--engine mysql5.1 \
--master-username root \
--master-user-password - \
--allocated-storage 5 \
--db-instance-class db.m1.small \
--db-name blog

Issue 1
rds-create-db-instance: Malformed input-No Credentials were provided – cannot access the service

Resolution
You probably didn’t have your credentials set in the credential file. If you did, then you probably forgot to add your AWS_CREDENTIAL_FILE location to your environment.

export AWS_CREDENTIAL_FILE=$AWS_RDS_HOME/<location of credential file>

Issue 2
rds-create-db-instance: Malformed input-The parameter DBInstanceIdentifier is not a valid identifier. Identifiers must begin with a letter; must contain only ASCII letters, digits,
and hyphens; and must not end with a hyphen or contain two consecutive hyphens.

Resolution
I had an underscore in my DBInstanceIdentifier. Underscores are not valid characters.

Changed:

rds-create-db-instance my_dbinstance \
...

To:

rds-create-db-instance mydbinstance \
...