Programming

[Refresher] Object Model in PHP 5

Official References (From the online PHP Manual):

Basic

[Refresher] POSIX.1 Advisory File Locking

The sample code here demonstrates the following techniques:

  • The use of POSIX fcntl() to perform file locking
  • The realization of mutual exclusion via file locking

[Refresher] TCP/IP with Multiple Clients and Concurrent Server

The sample codes here demonstrate the following techniques:

  • Concurrent FTP Server
  • The use of standard I/O with POSIX sockets
  • The handling of Unix signal SIGCHLD to prevent terminated children from becoming zombies
  • The handling of system calls interrupted by caught signals

[Refresher] Pipes and FIFO Special Files (in POSIX)

Use pipes (nameless/unnamed) and FIFO files (named pipes) in POSIX.

[Refresher] I/O Multiplexing with select()

Last Updated: 12-Nov-08

We want to make a Unix system call select() and proceed only when

  • some file descriptors are ready for reading, or
  • some file descriptors are ready for writing, or
  • after 2.3 seconds (maximum) have elapsed.

Writing Good Header Files in C/C++

[This article is still INCOMPLETE. I will continue to wrap it up as soon as I can... it may take weeks. There are a lot of information I would like to include...]

1. Introduction

Proper organization of C/C++ source codes is essential for code maintenance. Unfortunately, poorly organized codes are still found everywhere, making life very difficult for code maintenance.

In this article, I attempt to summarize some good practices in writing good C/C++ header files. I found that some similar articles do not emphasize certain important points, do not justify the ideas properly, or do not state exceptions to the common practices. I attempt to include these here.

2. Essential Rules

The following rules summarize provably good practices for writing C/C++ header files. They are quite essential, and have been (or should be) included in good coding standards.

  1. Weightless. No statement in a header file should inflict any byte to the final executable.
  2. Single. A header file should guards its content from being included more than once.
  3. Minimal. Include as minimal information as possible in a header file. Do not include anything (except preprocessing directives used for global configuration) used only by one implementation file.
  4. Self-Sufficing. A header file should be readily included without needing prior inclusion of other header files.

The following sections elaborate the rules in detail. The description, justification, checking, and exceptions of each rule are presented. (Note that weightless, single, minimal and self-sufficing are not standard technical terms for describing the practices. I borrowed/invented these terms to summarize the rules since I fail to find any term defined for such practices.)

[More:]

3. Being Weightless

3.1. Description

When you invoke a compiler, it goes through preprocessing, compilation, assembly, and linking phases. There are some language constructs that do not convert to concrete executable instructions but only aid in compilation by providing useful information. These constructs can be included in a header file.

The following constructs will be handled during the preprocessing phase:

  • Comments, which are treated as oases by a programmer but are treated as garbage by a preprocessor.
  • Preprocessing Directives, such as macro definitions (with #define, #undef), conditional compilation directives (with #if, #ifdef, #else, #elif, #endif), etc.

The following constructs only provide references for generating correct intermediate code (and do not inflict any byte):

  • Type Definitions (with struct, union, typedef, etc), which only introduce synonyms to existing data types.
  • Function Declarations, which tell the compiler how to communicate with these functions correctly.
  • External Variable Declarations (with extern), which tell the compiler how to treat these variables (defined elsewhere) when it sees them later.
  • Inline Function Definitions, which tell the compiler how to expand the calls to these functions and do not contribute any byte themselves.
  • Template Classes and Functions, which tell the compiler how to generate the concrete versions if necessary.

While the following components should not appear in a header file:

  • Variable Declarations.
  • Ordinary Function Definitions.
  • Class Method Definitions.

3.2. Justification

A header file essentially should contain only interface information or specification of some implementation file(s) of a library. In other words, everything included in a header file only serves as reference, and should not inflict any byte in the executable.

Suppose that you have defined some ordinary functions, class methods, or global variables in a header file, and that more than one implementation files have included the header file. Your compiler should produce errors complaining 'multiple definition of blah blah...' when linking all object files together, and should not generate any executable. (If it does, avoid it!)

If you have defined some file scope variables (e.g., static int stupid_var;) in a header file, then you are in big trouble! Suppose that two implementation files p1.c and p2.c have included it. The compiler will silently compile all files without errors, because it will generate two sets of such variables, each set for one implementation file! (But it may generates some warnings if an implementation file does not make use of all the variables defined.) When some function in p1.c modifies the values of these variables, p2.c won't know about it! This is rarely the behavior desired. Furthermore, the more implementation files that include the header file, the bigger the final executable will be, since the compiler will generate for each of these implementation files a private copy of these variables.

3.3. Checking

To find out whether your header file is weightless, simply compare the executables compiled by excluding and including the header file respectively. (Note: don't include debugging information during compilation.) Both executables generated should be exactly the same if the header file is weightless (and it does not anyhow impact the original implementation).

3.3.1. Normal Program

Here's an example how I have tested a header myheader.h in GNU/Linux environment. You can apply the similar concept on other OSs. First, create a dummy source file weightless.c:

int main(){return 0;}

Compile it with gcc:

\$ gcc -O3 weightless.c -o a.out

Modify the dummy source file to include the header interested (in this case, myheader.h):

#include "myheader.h"
int main(){return 0;}

Compile it with gcc:

\$ gcc -O3 weightless.c -o b.out

Compare the executables:

\$ diff a.out b.out

The header is weightless if the output is empty. (No news is good news!) If you see "Binary files a.out and b.out differ" then the header is not weightless. (Or, your header may be weightless but you have a lousy compiler.)

3.3.2. ANSI C/C++ Headers

It would be interesting to find out if the ANSI C/C++ headers follow this rule. Let's construct the following file weightless_ansicpp_headers.cpp (some lines have been omitted to save space):

#ifdef INCLUDE_MOST
// All ANSI C/C++ headers except <iostream>
#include <algorithm>
#include <bitset>
// ... Omitted
#include <cwctype>
#endif /* INCLUDE_MOST */

#ifdef INCLUDE_IOSTREAM
#include <iostream>
#endif /* INCLUDE_IOSTREAM */

/* Dummy main */
int main(){}
So, defining INCLUDE_MOST allows us to examine all ANSI C++ standard headers but iostream, while defining INCLUDE_IOSTREAM allows us to examine iostream. (The reason of examining iostream separately will become clear soon.) I performed the checking under GNU/Linux with g++ 4.0.0 as follows:
\$ g++ -O3 weightless_ansicpp_headers.cpp -o empty.out
\$ g++ -DINCLUDE_MOST -O3 weightless_ansicpp_headers.cpp -o most.out
\$ g++ -DINCLUDE_IOSTREAM -O3 weightless_ansicpp_headers.cpp -o iostream.out
\$ g++ -DINCLUDE_MOST -DINCLUDE_IOSTREAM -O3 weightless_ansicpp_headers.cpp -o all.out
\$ ls -latr *.out
-rwxrwxr-x  1 user1 user1 4680 Jun 14 11:45 empty.out
-rwxrwxr-x  1 user1 user1 4680 Jun 14 11:45 most.out
...
-rwxrwxr-x  1 user1 user1 5587 Jun 14 11:47 iostream.out
-rwxrwxr-x  1 user1 user1 5587 Jun 14 11:45 all.out
\$ diff empty.out most.out
\$ diff iostream.out all.out

From the output, we know that the header iostream is not weightless. But why can a properly designed and implemented standard header file violate this rule? The next section explains and justifies it.

3.4. A Weightful Trick

The rule of weightlessness holds as a result of using the header file to include only interface information. Pragmatically, this rule has been violated by iostream. Here's the story.

In C++, the initialization order of global objects among the translation units is unspecified. In some occasions, you may need to initialize some internal static components before anybody uses them. There are a few ways to solve this problem. The easiest way is probably to introduce to the class a special static variable, which is guaranteed to be initialized to zero. Then, for every method in the class, check the variable and perform initialization if required, as shown below:

class NeedInit {
  static int first_time;  // Initialized to zero by compiler

  static ABC_Type this_static_component_needs_to_be_initialized;
  ...
}
void NeedInit::fn1(void)
{
  if (!first_time) { init(); first_time=1; }
  ...

}
void NeedInit::fn2(void)
{
  if (!first_time) { init(); first_time=1; }
  ...
}

This approach is ugly if there are many class methods, and it also unavoidably introduces additional overhead for each invocation of any method involved. An alternative approach is to introduce a tricky auxiliary static object in the header file for helping the initialization of the class interested:

class NeedInit {
  ...
}
class NeedInit_Aux {
  static unsigned int count;  // Initialized to zero by compiler
public:
  NeedInit_Aux() {

    if (!count++) {
      ... // initialize NeedInit
    }
  }
}

// Force initialization before main(). NOT weightless!

static NeedInit_Aux NeedInit_Aux_initializer;

This approach, pioneered by Jerry Schwar, seems better as it requires only little change to the original class. However, unlike the former approach, this approach requires a header file that is not weightless. Is this justifiable?

Let's compare these approaches analytically. Let F be the number of files that require the class NeedInit (and hence the number of files that include the header), M be the number of methods in the class NeedInit, and n be the number of total run-time invocations of these methods. Note that, for a compiled executable, n is variable, while F and M are constants. It's easy to show that, the first approach incurs extra O(M) = O(1) space overhead and O(n) time overhead, while the second approach incurs extra O(F) = O(1) space overhead and O(M) = O(1) time overhead. So, the second approach outperforms the first approach asymptotically.

Taking various factors into the consideration, the second approach is a better choice for the implementation of the iostream library. Although iostream is not weightless, the use of the trick is fully justifiable. Since each inclusion of iostream incurs additional space overhead, it is important not to include the header if you don't need anything from the header for the implementation file. In fact, generally speaking, you should not include any redundant header file.

Nonetheless, it is extremely rare that you have to rely on this trick. The class can usually be redesigned to avoid resorting to such a trick. You can find out more information related to the trick from Section 3.11.4 in the book The Design and Evolution of C++ (by Bjarne Stroustrup), and from some articles online such as this one, and this one.

SSH with PHP 5

There are many cool things your can do if you have SSH with PHP. PHP 5 does not include this feature in its core distribution. So how can you do it?

One solution is to make use of the ssh2 package from PECL. The package provides several useful methods for secure remote access. Unfortunately, sometimes you may not be able to use the package due to some pragmatic problems. If you do not have privileged access to the server, and if you fail to convince the admin to install the package or the libraries required to compile it, you won't be able to use it (as well as many other PECL packages).

To solve this problem, I've written some code that requires only the following items to use SSH with PHP:

  • The SSH binary. (Normally found at /usr/bin/ssh.)
  • PHP 5. (However the script can be ported to PHP 4 easily.)
  • Strong understanding of the security issues involved! NO WARRANTY IS PROVIDED. Some security issues involved are beyond the control of the script and thus it's your responsibility to make sure that your server is properly protected!

1. Preparation

You need to prepare your server with a few things before you can make sure of the script.

  1. Setup Passwordless SSH Access. Configure the web server and the remote machine such that the server is able to login to the remote machine with a key pair instead of making use of password. I have written a post on how to do it.
  2. Create a safe working directory for Apache. Make sure that it's safe:
    • No other accounts except the two used by the Apache process and you are able to access the directory. Set the permission very carefully.
    • Don't put it in any directory under the Apache's DocumentRoot to make sure that no public user can even peek at the directory. Include a file .htaccess with a single line of DENY from ALL for extra protection.
  3. Prepare the file with private key to be accessible by Apache. Copy the file ~/.ssh/id_rsa to a safe place such as the working directory prepared above. Set the file ownership and permission carefully. For even better security, encrypt the private key. Modify the code to decrypt the key with a user-supplied password when needed. Be warned that if you lose the private key and do not respond in time, your machines are in great risk!

2. The PHP Class

Here it goes.

<?php
/*
A PHP Class for using SSH
Copyright (C) 2007  http://www.vyvy.org/

This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
*/
/* Changes:
 *   [29-Jan-07]  First release
 */

/*----------------------------------------------------------------*/
/* Configuration                                                  */
/*----------------------------------------------------------------*/
define('PATH_SSH', 'ssh');  // Path to SSH binary

/*----------------------------------------------------------------*/
/* Implementation                                                 */
/*----------------------------------------------------------------*/
define('SSH_ERROR_SSH_CONFIG', 0x1);


class vyvy_ssh
{
  private \$user;
  private \$host;
  private \$file_ssh_key;
  private \$file_ssh_config;
  private \$file_known_hosts;
  private \$error = 0;


  function __construct(\$user, \$host, \$dir_ssh, \$file_ssh_key)
    {
      \$this->user = \$user;
      \$this->host = \$host;
      \$this->file_ssh_key = \$file_ssh_key;  // For better security, you may want to encrypt this file and decrypt it when needed.

      /* SSH configuration */
      \$this->file_ssh_config = tempnam(\$dir_ssh, 'config_');
      \$this->file_known_hosts = tempnam(\$dir_ssh, 'hosts_');
      \$size = file_put_contents(\$this->file_ssh_config,
                                "StrictHostKeyChecking no\n"
                                . "UserKnownHostsFile {\$this->file_known_hosts}\n"
                                . "Protocol 2");
      if (!\$size) {
        \$this->error |= SSH_ERROR_SSH_CONFIG;
      }
    }


  function __destruct()
    {
      /* Delete temporary files */
      exec('shred --remove '.\$this->file_ssh_config);
      exec('shred --remove '.\$this->file_known_hosts);
    }


  function exec(\$command, \$wait=true)
    {
      if (\$this->error) {
        return array(false, 0);
      }

      \$cmd = PATH_SSH . " -n -F {\$this->file_ssh_config} -i {\$this->file_ssh_key} {\$this->user}@{\$this->host} 'sh -c \"{\$command}\"'";
      if (!\$wait) {
        \$cmd .= ' > /dev/null &';
      }

      exec(\$cmd, \$output, \$error);

      if (\$error) {
        return array(false, \$error);
      }
      return array(true, \$output);
    }
}
?>

3. Example

The following example lists the files of a remote machine. It's quite simple to use, isn't it?

<?php
require_once "vyvy_ssh.php";

\$ssh = new vyvy_ssh('vyvy', '10.0.0.7',
                    '/home/vyvy/www_private/ssh/',
                    '/home/vyvy/www_private/id_rsa');
list(\$status, \$output) = \$ssh->exec('ls -las');

\$output_formatted = htmlspecialchars(implode("\n", \$output));

echo <<<PAGE
<html>
  <head>
    <title>Sample Output</title>
  </head>
  <body>
<p>Status: {\$status}</p>
<p>Output:</p>
    <pre>
{\$output_formatted}
    </pre>
  </body>
</html>
PAGE;
?>
Syndicate content