Drupal 5 Released
January 21st, 2007 by mikeThe release of Drupal 5 creates a wondrous plethora of feelings inside me. They range from a curious awe to a less curious nausea. It also leads me to a long list of questions. Questions such as “Did they fix the API?”, “Did they fix any of the scalability and performance issues?”, and “Does it still rely so heavily on the database?”. I’m currently working on a blog project for a very large company that I won’t name. Despite the many problems that this company has had with Drupal implementations in the past, they chose to use it again, on this project. I don’t intend for this post to be fair or objective. I’m not writing a review and I’m sorry if the title confused you. I filed this post under rant, and that’s exactly what it’s going to be.
Here are a few of the biggest problems that I’ve experienced while working with Drupal:
1. The Core
Let me start by saying that the Drupal core was designed by ametures with no experience whatsoever in architecting usable software. Its messy, kludgey, and filled with implementation problems and beginner mistakes. For example, when a part of the core reaches a point in the execution where it is time to call one of the “hook”s, it calls a function named “module_invoke_all”. What this function (and all of the functions it calls) does is loop over an array of names of every module installed and active in the running system, check to see if there is a function defined called module_name . “_” . hook_name, and if so, execute it. This is horribly slow and ugly. I would have implemented a data structure that holds pointers to, er name strings of the implemented hooks. Each module would create a copy of that data structure, populate it, and then register it with the system, perhaps with a function called “register_callback” ;).
2. Database interaction
To put it in the (slightly modified) words of one of my esteemed colleagues: Drupal eats database resources like a fat man eats M&M’s. Drupal’s database usage is unbelievably high. The “just one more [query] won’t hurt” mentality that the engineers approached the system’s design with can cause crippling effects on even the most powerful database engines. With a few modules installed, Drupal can take over 150 database queries to render a single page. Where does drupal store user settings? The database. Where does it store cached files? The database. Where does it store user sessions? The database. Where does it store menu options, module information, form information, state tracking, error and warning logs, missed hits? The database, the database, the database, etc.
3. Scalability
In addition to the aforementioned items, Drupal is set up, by default, to handle all 404 errors. What this means is that every missed request that comes to your site requires drupal to load, bootstrap itself, run some (~10+) database queries, print a 404 page, and die. Every time. There is a cronjob that must be run on a regular basis to keep the cache table clean, and every time the system does a write to any one of the cache tables, Drupal “write locks” the entire table. What this means is that no ‘read’ queries can execute until the write query finishes, and if the application crashes… oh well.
4. Security
I haven’t had much time to spend looking for holes in the design, but I can say one thing about the system security that I noticed after about 30 seconds of using it. If Drupal looses connection to the database, by default it prints - to the world - a page that contains the username and host that the database is running on. Thanks for answering 2 of the 3 questions I need to break in.
All in all, I think that drupal can be nice for small low traffic sites or for people who just love to hack away at the internals of their cms. As for me, I need something that’s secure, stable, scalable, intuitive, and properly designed. Maybe next time I’ll push for Plone.
But what do I know?
Posted in Rant |
January 23rd, 2007 at 5:10 pm
While you have a few valid points in your rant - which are already being addressed -, I am wondering how all the people using Drupal in large scale applications do so.
January 23rd, 2007 at 5:11 pm
Totally inaccurate. Drupal handles large projects just as fine. We’ve had our own large-scale and very large-scale sites built with Drupal with great success and with great scalability and there are quite some sites out there like The Onion or the popular Popsugar.com network that proof just the contrary. Apparently the author just has had a very first look at Drupal. Certainly things can always be done better, but he is not proposing any better solution at all, so I find postings like this a little questionable and polarizing.
January 23rd, 2007 at 5:25 pm
Security by obscurity (keeping db host and username secret) is not good. Firewalls and passwords protect your DB. Drupal is regarded one of the most secure CMSes for good reasons — we have a very good security team and I had the honor to lead it for some time.
We agree on hook registration, I work on it, there is code already in my sandbox.
We agree on database queries, there are several patches being coded for Drupal 6 that will mean less queries.
We do not agree on scalability for example http://www.zattevrienden.be/forum (not work-safe) is _huge_ , I mean hundreds of hits per second huge. And then MTV UK and …
January 23rd, 2007 at 5:26 pm
1. Checking for existence of function names is very fast. Faster than, say, retrieving a list of string from storage. And the names are cached in PHP, so the checking is only done once.
2. Cached pages take only a single database query. Turn on your database’s query cache so those queries are fast. Storing information in the database allows you to scale up your web frontends by simpling adding boxes.
3a. If you don’t want Drupal to handle the 404’s, add three lines to your .htaccess file: http://drupal.org/node/76824
3b. If you don’t want locking contention, see the “Remove database locking” thread: http://drupal.org/node/55516
4. The error page that you refer to is themable. Simply write a theme function that overrides it if you want something different displayed.
January 23rd, 2007 at 6:47 pm
Doesn’t point #2 contradict #1? Wouldn’t the registered callbacks be stored in the database?
January 23rd, 2007 at 7:36 pm
My point is NOT to store that in the database, the files are read and processed by the system during drupal’s bootstrap, the registration of the callbacks would occur at that time.
January 23rd, 2007 at 10:51 pm
Heh. Looks like the best way to drive traffic these days is to say bad things about Drupal.
Some very interesting comments, Mike. The point about function caching etc is a good one, though it’s not as bad as you seem to think it is — it’s done once during the bootstrap process, not each time module_invoke_all() gets called. Still, that slows down the bootstrap process and we’re looking for better ways to handle it. There’s already been code written to that end and there are some good proposals on the devel list.
Other issues — like database locks — are also hotly contested. One of the biggest problems is that the core team has remained pretty committed to supporting older installations, so taking advantage of certain things like InnoDB’s much improved locking has been held up. In addition, some of the popular functions (like brain-dead-simple path aliasing) can be very DB intensive unless they’re used in cooperation with good caching solutions like memcache.
One of the first steps most shops take when optimizing Drupal is flipping a few of those switches that the core leaves hidden; a lot of that knowledge exists in the community and inside the Drupal handbooks, though, so it’s understandable that an internal team could miss out on some of the critical optimization tools. Do you think there’s a way we can improve the accessibility of that information?