fromMarch 2012
Article:

Code-Name: Insight

Acquia's Way of Finding Needles in a Needle-Stack
0

When I left my position managing a five-person Drupal team at Kaspersky Lab, I brought along a problem: I had watched my team spend countless hours working through code and configuration trying to find the sources of errors, locate optimization points, and improve the architecture and management of our dozen sites for the Americas region. The problem was in the process. It was neither scalable nor manageable. When I moved to Acquia and joined the Acquia Network team, I made that problem a priority.

One part of the Acquia Network is the connector module. I used the project of enhancing this module to build the dream: Eliminate time wasted investigating code and configuration by providing proactive notifications of issues and whether or not best practices are being followed.

How Did We Develop It?

Code-naming the endeavor Insight we split the project into two parts. The first was the gathering and analysis of data; the second was creating the infrastructure to handle the millions of inbound connections, data storage, and processing.

Past experience had taught us that using the same site for backend data-processing and the front-end UI was bad. The two parts invariably became two different Drupal sites and, in the end, split into four separate sites. You read that right. Every part of Insight, from data sending, processing, analysis, and display is built in and on Drupal—it's Drupal, PHP, and MySQL to the core.

During development we found that many modules we needed were alpha, or simply non-existent. The team wrote, or contributed to, over thirty modules while building Insight—many have been released to the community.

Once the underpinnings were in place, the final step was to create the font-end site. Part of that process consisted of: an internal test release, a demo at Drupalcon London, a redesign, user testing, and finally, a closed and then open beta. The process refined the end product into something that delivered a very clean and functional experience to our users.

What's "Under the Hood"?

We broke our analysis of a site into a few key areas. The main divisor was Drupal configuration and settings vs. the stack's configuration and performance. From there we broke it down into performance, security, and best practices (which mostly involves scalability).

Almost everything can be covered by these questions:

  • What sites are reporting in for a subscription?
  • What are the Drupal configuration issues?
  • What modules are installed? (What does their code look like? What is their update status?)
  • How optimized is the hosting stack for this site?
  • How secure is the site?

The basic Drupal information collected runs the gamut from the distribution in use, all the way to node and user counts, which node access modules are enabled, and identifying if there are pending database updates. We also looked at modules by looking for code changes for core/contrib modules and checking for available updates.

We gave site owners a leg up by highlighting PHP settings that aren't quite right and the data to know if their server is correctly tuned. This is an area that will be enhanced quite a bit in future releases to give a very clear picture of how well the server your site is on will be able to respond to requests.

In the same vein we looked at how well Drupal is optimized to serve pages faster. Insight keeps an eye on your page and blocks cache settings—as well as css and js aggregation settings—so that you can be notified if they get turned off. These are just a few of the tests run against a site to identify what specifically is causing a site to run poorly.

Finally, we are keeping an eye on settings that affect a site's security. If a site is not using SSL for logins, or certain roles have admin privileges, we're going to let the site owner know and show them how to fix it.

All alerts are organized and displayed by severity, with critical performance alerts accompanied with a "help me fix it" recommendation that includes the problem description, recommended course of action and suggested settings. As a developer you can fix the alert, or choose to ignore it.

The deep visibility into a Drupal site provided by Insight can only be otherwise gained by putting one or more full time developers on the task of finding needles in a haystack (or sometimes needles in a needle-stack), when their time would be better spend developing, innovating, and contributing those innovations back to the community.

Benefits for Beginners

Insight presents newcomers an opportunity to learn best practices and be guided to the enhancements their site needs. Beginners will also learn the cadence of how often your code & configuration quality should be evaluated, and be proactively notified when there are updates to your Drupal modules, security patches, and other events that require action. By presenting todo's along with recommended actions, the alerts serve as both a "what do I do today?" list and an educational tool.

Benefits for Experts

For expert Drupalists, finding a code or configuration issue could take anywhere from five minutes to several hours. Insight now showcases probable root causes so developers are immediately resolving issues rather than hunting for causes.

In addition, expert Drupal developers often inherit sites that they did not build. Insight can serve as a great initial site audit to help the developer understand what they've been given.

Findings

Because you don't have to host with Acquia to get access to Insight, over half of all Acquia customers are now using Insight, and thousands of individual websites feed data into the system.

We've alerted website admins to over a hundred thousand serious issues on their production websites. As a result, site admins have:

  • disabled Security PHP filter for anonymous users (yes, this has happened on production sites);
  • properly configured cache settings;
  • fixed incorrect configurations of the global redirect module;
  • significantly improved the security and performance of their sites.

What to do if You Don't Have Insight

For organizations that don't have access to Insight and would rather manually triage the issues, we've provided a guide that represents the information we track and tests we run to determine a site's performance, compliance with best practices, and security.