Back to Programming Articles

[Part 2] Learning the Basics of Coding: Database Interaction
by Pandemikk 31 Mar 2012

[Part 1] Learning the Basics of Coding

The second part of my four part article will focus on database interactions: Using PHP to interact with your database in various manners. This is a fundamental part of any modification, software or program as the database stores information that would otherwise be lost when a script has finished executing.

The Basics
I will use some terms here you may be unfamiliar with, so hopefully this depiction will help you visualize the relationship.

Code:
+---------+-------------+------+-----+---------+-------+
| Field   | Type        | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| name    | varchar(20) | YES  |     | NULL    |       |
| owner   | varchar(20) | YES  |     | NULL    |       |
| species | varchar(20) | YES  |     | NULL    |       |
| sex     | char(1)     | YES  |     | NULL    |       |
| birth   | date        | YES  |     | NULL    |       |
| death   | date        | YES  |     | NULL    |       |
+---------+-------------+------+-----+---------+-------+
Table: Pet (http://dev.mysql.com/doc/refman/5.0/en/creating-tables.html)

For full information on column types please see the above link.

Database Interacting
For the sake of where this is being posted, we are going to be using PHP as our server-side language and MySQL as our database. We will also be assuming we are within a vB environment. The database abstraction class between vB3 and vB4 is very similar, so no assumptions of vB versions will be made.

Important Variables & Constants:
Please refer to part one if you do not know what a constant is.

$vbulletin is a variable. It refers to the vbulletin object.
$db is a variable. It refers to the database object.

Objects are instances of a class. Classes contain their own constants, variables (properties) and functions (methods).

$vbulletin->db for all intents and purposes is the same as $db. However, depending on where you are using the code, $db (and even $vbulletin) may not be available to you. Why? Variable scope. If you ever get a using $db when not in context error, that means $db is not an object. You'll get whenever operating in a class that does not have $db in its scope. You can fix this by adding global $db; or passing $db as an argument when calling a method. Be sure that $db is an object before passing it as an argument.

For this guide, we will be assuming $db is in context. I will be using $db instead of $vbulletin-> as I prefer it because it is much shorter and, I believe, there is less overhead from doing so. It should be noted that when $vbulletin is in context but $db is not you should use $vbulletin->db instead of bringing $db into context.

TABLE_PREFIX this is a very important constant. Never, ever, ever forget to put this in front of a table when querying the database. Why? Because for databases that use a prefix you will get a "table not found" database error. Never allow database errors.

query_first
Selects the first result within a query.

A common misconception, one I had until recently as well, was that query_first adds a LIMIT ONE. It does not. This means, unless only one row is returned in the result set, you will be wasting resources. More on why this is inefficient in [Part 3] Learning the Basics of Coding: Best Practices

Spoiler (click to open)


The third part of my four part article will focus on better coding: Improving the efficiency and logic of your existing code. This skill literally makes the difference between your novice vb.org hack and a full-fledged modification that people will pay for.

The Basics
I'm going to fully assume you have read or already have an understanding of [Part 1] Learning the Basics of Coding and [Part 2] Learning the Basics of Coding: Database Interaction. I'm going to assume you have knowledge of types, variables, constants, objects, resources, classes, methods, properties, table structures, etc.,. If you don't, you're probably going to get lost.

PHP
PHP is a fast language, capable of performing countless operators in microseconds. However, when poorly coded, something that should have executed in .006 seconds may end up taking .06 seconds. Don't think that's a big difference? Then leave the article - because that's slower by a factor of 10.

loops
Knowing which type of loop to use may seem trivial, but it's like using the wrong type of screwdriver or wrench. Sure, it'll get the job done, but it's going to take longer and waste more energy.
Here are some general rules when trying to figure out what type of loop to use:

for is for when you know how many times you want to execute a loop.
PHP Code:
for ($i 0$i 10; ++$i)
{
    echo 
$i;



while
is when you aren't sure how many times, but you want to loop until a condition is met.
PHP Code:
while ($row $db->fetch_array($mysql_result))
{
    
var_dump($row);

Don't know what var_dump is? Google it. It's very useful.

foreach is generally just for looping through an array.
PHP Code:
foreach ($array AS $key => $val)
{
    echo 
$key' - '$val;

Multiple parameters > string concatenation.

Though all loops have their differences, they all have their similarities too:

Do not use count() or its alias sizeof() in any loop expression.
PHP Code:
$array array_fill(0100'value'); # Creates an array
for ($i 0$i count($array); ++$i)
{
    echo 
$i;

Why are you calling count() at the beginning of each iteration?

PHP Code:
$array array_fill(0100'value');
$count count($array);
for (
$i 0$i $count; ++$i)
{
    echo 
$i;

count() is called once instead of 100 times. I just saved you some processing time.

Now you may have noticed I am consistently pre-incrementing instead of post-incrementing my $i. Why? It's about 10% faster. Pre-increment where possible!

Let's move on to another common loop problem. Can you spot it?
PHP Code:
for ($i 1$i 10; ++$i)
{
    for (
$j 1$j 10; ++$j)
    {
        
$troll_var = ($i 10) + ($j 10);
    }

The problem is the loop-invariant code. A loop-invariant is code that's true outside of the loop and, subsequently, true in each loop iteration. What does this mean? It means you're executing code multiple times when it only needs to be executed once!

PHP Code:
for ($i 1$i 10; ++$i)
{
    
$a $i 10;
    for (
$j 1$j <= 10; ++$j)
    {
        
$troll_var $a + ($j 10);
    }

By moving $i * 10, a value which will not change within the second loop, we are saving processing time. Why calculate ($i * 10) ten times when you can calculate it once? It's like not having memory and asked to do the same equation over and over. You just answered it, so why are you trying to find out the answer again?

functions
Calling user-defined functions is expensive. Whenever possible, don't be afraid to inline a function.

PHP Code:
function add_one($i)
{
    ++
$i;
    return 
$i;
}
   
$j 0;
for (
$i 0$i 1000; ++$i)
{
    
$j add_one($j);

You're calling a function a thousand times. Remember when I said count() inside a for was bad? This is even worse! Not just because it's being called 1000 times but because a user-defined function is much slower than a built-in function!

PHP Code:
$j 0;
for (
$i 0$i 1000; ++$i)
{
    ++
$j;

This will be exponentially faster.

Now I feel obligated to mention that you shouldn't inline all user-defined functions you possibly can. This is just silly. You should only do this in obvious cases where the function is either hardly used, or is so short it just doesn't make sense. Functions centralize your code and make life easier. Don't throw away a cup because you can drink from the tap! But DO get rid of that old cup your great-aunt gave you that no one ever uses yet it takes up the room of 3 normal cups. Secondly, don't get cocky and try to inline built-in functions, either, because that will be counter-productive. Built-in functions use optimized C code and you're not going to beat that with PHP.

Another problem is sometimes you don't need to use a function at all! You can use a language construct which are faster than functions!
PHP Code:
$foo true;
unset(
$foo);
   
if (
is_null($foo))
{
    echo 
'Foo is null';

While this code is correct and gets the job done, one must wonder why a function is being used?

PHP Code:
$foo true;
unset(
$foo);
   
if (!isset(
$foo))
{
    echo 
'Foo is null';

There we go. Drop the function and use a language construct instead. Not only is it faster but, in my opinion, it looks nicer, too.

use the right function
Don't use preg_split instead of explode when you don't need regular expressions. This goes for any function that supports regex. If you aren't using regex then use an alternative. e.g: str_replace instead of preg_replace.

freeing memory
This should be obvious, but it's something that isn't done as often as it should (sometimes I find code with arrays that should have been unset a long time ago, but that's not even the bad part - the bad part is it's my code.).

PHP Code:
$a = array();

$array array_fill(01000'value');
foreach (
$array AS $key => $val)
{
    
$a[$key] = $val;
}

// pretend there's hella more code here 
I've already got all the data from $array into $a so it's no longer needed. Why keep $array around?

PHP Code:
$unset($array); 
There we go. Memory freed!

initialize!
This is such an underestimated coding practice. Whenever you are using a variable or an array element you should always be sure it exists before doing anything besides assigning a value to it! Why? Keep reading!

PHP Code:
/*****
* Bad code
* @initalize - (bool) false
*/
// incrementing an undefined variable is much slower than incrementing a defined one
++$i;
   
// unnecessary else
if ($var == true)
{
    
$bool true;
}
else
{
    
$bool false;
}
   
/*****
* Good code
* @initalize - (bool) true
*/
if (!isset($i))
{
    
$i 0;
}
++
$i;
   
$var $bool false;
if (
$var == true)
{
    
$bool true;

It should also be mentioned that initializing your variables can play an importance in security as well. More in [Part 4]. Furthermore, attempting to do something to an undefined variable can also result in: Notice: Undefined variable: $variable_name with the right php.ini configuration (you shouldn't be outputting errors on a live site though).

logic
One thing novice coders often do is making a mess of their logic (I do that a lot, too, actually). Examples include evaluating the same expression multiple times in a script, within an if condition, redundant code, etc. There's a popular principle called "DRY"; it stands for don't repeat yourself.

PHP Code:
$i $j 5;
if ((
$i == AND $j == 5) OR ($i == AND $j == 5))
{
    echo 
true;

$j was checked twice. Why? Poor logic. Not only is poor logic a waste of processing time, but it's also harder to read and takes longer to type. There's not one benefit to checking $j twice so break a bad habit and rewrite that code.

PHP Code:
$i $j 5;
if (
$j == AND ($i == OR $i == 5))
{
    echo 
true;

Much better! Cleaner. Faster. Logicaler™.

Let's take it up a notch:
PHP Code:
if ($is_admin == true)
{
    
$access true;
}
else
{              
    if (
$blacklisted == true)
    {
        
$access false;
    }
    else if (
$has_access == true)
    {
        
$access true;
    }

How can I write this code better? I want to give admins access, prevent blacklisted access unless they're an admin, and give access to people that have access unless they're blacklisted.

Let's look at a few things first. We have $access = true twice! Certainly we can refactor the code to remove this redundancy? Secondly, it seems we still have not initialized $access! If a user meets none of those conditions then there's a possibility he may get unauthorized access!

PHP Code:
if ($access !== false)
{
    
// code here

Because NULL is not identical to false, this person has gained access to a place he shouldn't have!

Thirdly, look at the nesting. For such simple checks do we really need double nested code? Certainly not! So how would you rewrite this code better?

PHP Code:
$access false# Initialize
if ($is_admin === true OR ($has_access === true AND $blacklisted === false))
{
    
$access true;

That's how I would do it. Satisfies all the requirements and solves all the problems. I've taken it a step further and used identical operators because it's faster and more fool-proof.

misconceptions
I've taken the time to find out many optimization tips for PHP and have found that a few of the more popular ones are absolutely false. I'm going to include them in here because part of writing good code actually involves knowing what good code is and is not.

single quotes vs. double quotes
There's no noticeable difference in performance between using single and double quotes. Single quotes is indeed a bit faster but the percentage is < .01%. If you want to shave a nano-second in parsing time then by all means convert all double quotes to single quotes. Personally, I'm not going to. As a side note: I use single quotes whenever possible but if double quotes make my life easier I will not hesitate to use them.

PHP Code:
$do_this 'If it makes sense.';
$do_this "I have a $var I want parsed.";
$me 'I do this ' $out ' of habit not ' $out ' of micro-optimizing OCD'
get rid of comments and whitespaces
Don't. You'll waste far more time pressing the backspace button then the amount of processing time saved will ever come close to. In fact, I encourage consistent indentation, spacing and commenting. Keep your code clean, neat, and understandable.

PHP Code:
if($true==true){echo 'i just saved 15 femtoseconds by switching to annoyingly hard to read coding practices';} 
obfuscation
The practice of rewriting variables, constants, functions, classes, etc,. to use as little characters as possible. Similar to the above: Don't. "Keep your code clean, neat, and understandable."

general tips
Contrary to the above, I've also taken the time to confirm many common optimization tips.
  • Declare methods static when possible. Improves speed by a factor of 4.
  • Use echo and take advantage of its multiple parameters. In other words, use a comma instead of a period.
  • $row[’id’] is 7 times faster than $row[id]. Copypasta because everyone's read this one.
  • Never use short tags. Always use <?php and ?
  • For PHP > 5, unset() is faster and uses less memory than type casting null.
MySQL
Going to assume we're in a vB enviornment and that $db is an object, yes? Also going to assume you’ve read [Part 2] Learning the Basics of Coding: Database Interaction, yes? Good!

query_first
Remember that this doesn't add LIMIT 1. This means that unless querying only one result your query will be inefficient. Why? Imagine you want to buy a bottle of honey. You drive to the database (store), and then you go to the table (aisle), and then you SELECT honey. But wait, you didn't limit yourself to one bottle of honey; you grabbed ALL the bottles of honey. So now you're walking to the check-out with your arm full of bottles of honey and proceed to buy them. Then you drive back home with your bounty of honey and while putting away your groceries you throw out all those bottles of honey and put one inside your cabinet. Understand? If you're going to get more than one result add LIMIT 1!

PHP Code:
$derp $db->query_first("SELECT dateline FROM " TABLE_PREFIX "post WHERE threadid = 67 ORDER BY dateline DESC "); 
WRONG! What if this thread had tens of thousands of posts? Your server is going to hang! Hope you aren't using bluehost (zing!).

PHP Code:
$great_success $db->query_first("SELECT dateline FROM " TABLE_PREFIX "post WHERE threadid = 67 ORDER BY dateline DESC LIMIT 1"); 
Fast!

But to be terribly honest, if you want to select the last post in a thread you should be doing this:
PHP Code:
$derp $db->query_first("SELECT lastpost FROM " TABLE_PREFIX "thread WHERE threadid = 67 "); 
This eliminates the need to sort through the posts and since `threadid` is the primary key in table thread it's going to be extremely fast and efficient.

avoid filesort
Sometimes it's unavoidable, but most of the times it is (avoidable).

Not sure what I mean? Enable debug mode and look in the extra column. It may take you awhile, but you're bound to see "using filesort" or "using temporary".

using temporary
The temporary table where rows are being put in has gotten so big that it needs to be sorted on a disk.

using filesort
A sort that isn't being performed on indexes. Yes, it is horribly named.

To avoid these, in 99% of the cases, you need to either fix your query or fix your table structure. You should never be sorting on non-indexes unless the amount of rows in question is a relatively small amount: say < 1000. This is because with so relatively few rows it's going to be faster sorting 1 by 1 then hitting the indexes anyway.

PHP Code:
$filesort $db->query_read("SELECT threadid FROM " TABLE_PREFIX "thread ORDER BY replycount DESC LIMIT 50"); 
MySQL is going to have to go through ALL of those threads you know...

PHP Code:
$index $db->query_read("SELECT threadid FROM " TABLE_PREFIX "thread ORDER BY dateline DESC LIMIT 50"); 
You're hitting the index so you're good.

normalization
This is, truthfully, deserving of an article of its own. I may someday decide to make one, but, for now, I'll touch the basics of this VERY IMPORTANT practice.

What is normalization?

redundancy
Take, for example, the following table structure:
STUDENT (student_id, name, birthday, gender, class)

This is horrible. Multiple classes = multiple nearly identical rows. We want each table to be responsible for one set of data. Let’s rework this a bit.

STUDENT (student_id, name, birthday, gender)
CLASS (name, student_id)

This is better, but still terrible. What if the name changes? What if we want to add more data about the class?

We need three tables:
STUDENT (student_id, name, birthday, gender)
CLASS (id, name)
CLASS_STUDENT (class_id, student_id)

num_rows
Like in part 2, I mentioned that you should always check to make sure you have a mysql resource before doing anything to it. This is the ONLY way I know of to do this without causing a warning, notice, fatal error, blackhole, etc.

UPDATE: It seems the fetch_array function actually suppresses errors so num_rows is not TECHNICALLY needed to avoid that nasty error being shown. In fact, I now think it is more efficient to avoid num_rows in cases except where you wish to error out early. Reasoning? It's unnecessary overhead without any difference in actual results. If we weren't within a vB environment I'd fully recommend it but since we are there's no reason to call a function to avoid an error that would be suppressed anyway!

PHP Code:
while($result $db->fetch_array($results))
{
    
//  right

This is perfectly acceptable.

PHP Code:
if ($db->num_rows($results))
{
    while(
$result $db->fetch_array($results))
    {
        
//  less right
    
}

Contrary to what I thought, this isn't necessary to avoid a nasty error.

PHP Code:
if (!$db->num_rows($results))
{
    
// error out
}

while(
$result $db->fetch_array($results))
{
    
//  run code

Why error out? To avoid running code that has no purpose. Running code that has no purpose is a waste of resources and may even lead to unexpected results.

[Legacy info]
Now why would you want to use num_rows before fetch_array? Besides the error message, let's dig a bit deeper into these functions.

http://php.net/manual/en/function.mysql-num-rows.php
As you can see, num_rows either returns the number of rows in the result set or false on failure.

Now what about fetch_array?
http://php.net/manual/en/function.mysql-fetch-array.php
It returns an array. There is no false on failure. If you use this function on a non-result set you will get: Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in path/to/script.php on line x

It should be noted that part of being a good coder is to avoid notices, warnings, fatal errors, database errors, etc.
[/Legacy Info]

Welp! That’s about it. I probably should dive into MySQL more but nothing else comes to mind at this ungodly hour. Let me know what you think! Hopefully a few of the more infamous coders around here read this and soak it in.

Close
.

So what's the point of using query_first? Because it saves you time. It fetches the first row and returns it to you as an array instead of a result. It also frees the result from memory.

PHP Code:
$me $db->query_first("SELECT userid FROM " TABLE_PREFIX "user WHERE username = 'Pandemikk'");

// outputs 344400
echo $me['userid']; 
PHP Code:
$me $db->query_first("SELECT userid FROM " TABLE_PREFIX "user WHERE username = 'Pan%' "); 
% is a wildcard selector. Selects every username that begins with: Pan

query_read
Now say you need multiple rows? Well then you need a result set and query_read is your friend.

PHP Code:
$result_q $db->query_read("SELECT username FROM " TABLE_PREFIX "user WHERE userid IN (0, 344400)"); 
Nobody has a userid of 0, because userid is an auto-incrementing field. This means, each time a new record is added to the table the userid field goes up by one. This is why userid, threadid, forumid, etc,. never have values less than 1.

num_rows
Returns the amount of rows in the result resource. This method ONLY works on mysql results. Returns false on failure and, as you should know, a control structure (such as if) does not execute if it evaluates to false.

You should use this method before fetch_array because using fetch_array on an invalid mysql result set (no rows returned for example) will emit: Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result. Avoid warnings.

PHP Code:
$result_q $db->query_read("SELECT username FROM " TABLE_PREFIX "user WHERE userid = 0");
 
if (!
$db->num_rows($result_q))
{
      echo 
'Fail';
      exit;

Remember, no 0 values in auto-increment fields.

fetch_array
Returns an array from a result resource. In other words, it takes a result set and returns a row from it. So to grab all the rows you will need to use a loop.

PHP Code:
$result_q $db->query_read("SELECT username FROM " TABLE_PREFIX "user WHERE userid IN (0, 344400)");
 
if (
$db->num_rows($result_q))
{
     while (
$result_r $db->fetch_array($result_q))
    {
        echo 
$result_r['username'];
    }
}
$db->free_result($result_q); 
This is the best way to fetch a mysql result.

free_result
This method will destroy the result resource thus freeing up memory. You should always free_result a mysql result when you are finished with it.

PHP Code:
$result_q $db->query_read("SELECT username FROM " TABLE_PREFIX "user WHERE userid IN (0, 344400) ");
 
$db->free_result($result_q);

if (
$db->num_rows($result_q))
{
    echo 
true;

Your browser will never receive output because $result_q is now NULL. NULL evaluates to false.

query_write
This method will allow you to save to the database.

PHP Code:
$result_q $db->query_write("UPDATE " TABLE_PREFIX "user SET username = 'King', email = 'email' WHERE username = 'Pandemikk' "); 
Whenever specifying multiple columns to be updated you must use a comma before each column = value. Not doing so result in an error message.

PHP Code:
$result_q $db->query_write("INSERT INTO " TABLE_PREFIX "user (username) VALUES ('Pandemikk') "); 
Will insert another of me into the user table. I'm not including any of the other user columns because this is an example.

The difference between update and insert is fairly obvious. Update will update an existing record while insert will create a new record.

There are also less common variants of the above suit for special purposes.

PHP Code:
$result_q $db->query_write("INSERT IGNORE INTO " TABLE_PREFIX "user (username) VALUES ('Pandemikk') "); 
Assuming username is a primary key or unique key, this query will insert a new record if the username Pandemikk does not exist or do nothing (no database error) if it does.

PHP Code:
$result_q $db->query_write("REPLACE INTO " TABLE_PREFIX "user (username) VALUES ('Pandemikk') "); 
Assuming username is a primary key or unique key, this query will DELETE the old row and INSERT the new one if a record for the username Pandemikk exists.

deleting records - From time-to-time, you may need to delete some old records. This can be used such as below:
PHP Code:
$result_q $db->query_write("DELETE FROM " TABLE_PREFIX "user WHERE username = 'Pandemikk' "); 
Deleted myself from the user table.

escape_string
Always use this method when using strings in your query. An exception is if you know for sure that the string is not user-input and does not have any single quotes in it. What's the big deal you're asking? The big deal is doing so leaves you open to SQL injections. SQL injections basically say: "Here's access to the database, do whatever you want with it." See [Part 4] for more information on security.

PHP Code:
$username "Pan'demikk";

$result_q $db->query_read("SELECT userid FROM " TABLE_PREFIX "user WHERE username = '" $username "' "); 
What did I do wrong here? I caused a database error because the query will execute as:

Code:
SELECT userid FROM user WHERE username = 'Pan'demikk'
The query's like wtf is "demikk"? Note: I have no table_prefix.

PHP Code:
$username "Pan'demikk";

$result_q $db->query_read("SELECT userid FROM " TABLE_PREFIX "user WHERE username = '" $db->escape_string($username) . "' "); 
Much better. Although there's no user with the name "Pan'demikk" there's no exploit in the query, either.

What escape_string does it add a backslash in front of single quotes, backslashes, and other special characters, which effectively "escapes" the input.

When the query is executed, backslashes are stripped once from the parser and again when the pattern match is made, so don't worry about escape_string returning wrong data.

Final Notes
Couldn't really find a place to put these so I'll stuff them here.
  • All results from a query are returned as strings. Even if the column is type int it will return as a string.
  • You should never wrap integers around single quotes.
  • You should always wrap strings around single quotes.
  • Use = instead of LIKE when searching for strings without a wildcard.
  • By default, MySQL is not case-sensitive.
  • If you want to search for '\' you will need to specify: '\\\\' unless you are searching at the end of the string then simply '\\' will suffice.
  • vB coding standards dictates you wrap queries in double-quotes. This means you do not need to exit a string when using a variable within a query.
PHP Code:
$me['userid'] = 344400;

$result_q $db->query_read("SELECT userid FROM " TABLE_PREFIX "user WHERE userid = " $me['userid']); 
That's a perfectly acceptable query and, in my personal opinion, is better than:

PHP Code:
$me['userid'] = 344400;
 
 
$result_q $db->query_read("SELECT userid FROM " TABLE_PREFIX "user WHERE userid = " $me['userid'] . " "); 

vblts.ru supports vBulletin®, 2022-2024