Apache的ReWrite的应用（三）

sunshine · #1 IP: 218.2.67.48 2006-02-18, 10:09 PM

另外，假设需要使用其他程序：wwwlog(显示access.log中的一个URL子树)和wwwidx(对一个URL子树运行Glimpse)，则必须对这些程序提供URL区域作为其操作对象。比如，对/u/user/foo/执行swwidx程序的超链是这样的：

代码:
/internal/cgi/user/swwidx?i=/u/user/foo/

其缺点是，必须同时硬编码超链中的区域和CGI的路径，如果重组了这个区域，就需要花费大量时间来修改各个超链。

方案:
方案是用一个特殊的新的URL格式，自动拼装CGI参数：

代码:
RewriteRule ^/([uge])/([^/]+)(/?.*)/* /internal/cgi/user/wwwidx?i=/$1/$2$3/
RewriteRule ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3

现在，这个搜索到/u/user/foo/的超链简化成了：

代码:
HREF="*"

它会被内部地自动转换为

代码:
/internal/cgi/user/wwwidx?i=/u/user/foo/

如此，可以为使用:log的超链，拼装出调用CGI程序的参数。

从静态到动态
说明:
如何无缝转换静态页面foo.html为动态的foo.cgi，而不为浏览器/用户所察觉。

方案:
只须重写此URL为CGI-script，以强制为可以作为CGI-script运行的正确的MIME类型。如此，对/~quux/foo.html的请求其实会执行/~quux/foo.cgi。

代码:
RewriteEngine on
RewriteBase /~quux/
RewriteRule ^foo.html$ foo.cgi [T=application/x-httpd-cgi]

空闲时间内的内容协商
说明:
这是一个很难解的功能：动态生成的静态页面，即，它应该作为静态页面发送(从文件系统中读出，然后直接发出去)，但是如果它丢失了，则由服务器动态生成。如此，可以静态地提供CGI生成的页面，除非有人(或者是一个cronjob)删除了这些静态页面，而且其内容可以得到更新。

方案:
以下规则集实现这个功能：
代码:
RewriteCond % !-s
RewriteRule ^page.html$ page.cgi [T=application/x-httpd-cgi,L]
这样，如果page.html不存在或者文件大小为null，则对page.html的请求会导致page.cgi的运行。其中奥妙在于，page.cgi是一个将输出写入page.html的(同时也写入STDOUT)的常规的CGI脚本，执行完毕，服务器则将page.html的内容发出。如果网管需要强制更新其内容，只须删除page.html即可(通常由一个cronjob完成)。

自动更新的文档
说明:
建立一个复杂的页面，能够在用编辑器写了一个更新的版本时自动在浏览器上得到刷新，这不是很好吗？这可能吗？

方案:
这是可行的! 这需要综合利用MIME多成分、网站服务器的NPH和mod_rewrite的URL操控特性。首先，建立一个新的URL特性：对在文件系统中更新时需要刷新的所有URL加上:refresh。

代码:
RewriteRule ^(/[uge]/[^/]+/?.*):refresh /internal/cgi/apache/nph-refresh?f=$1

然后，修改URL

代码:
/u/foo/bar/page.html:refresh

以内部地操控此URL

代码:
/internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html

接着就是NPH-CGI脚本部分了。虽然，人们常说"left as an exercise to the reader"，我还是给出答案了。

代码:
#!/sw/bin/perl
##
## nph-refresh -- NPH/CGI script for auto refreshing pages
## Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved.
##
$| = 1;

# split the QUERY_STRING variable
@pairs = split(/&/, $ENV);
foreach $pair (@pairs) {
($name, $value) = split(/=/, $pair);
$name =~ tr/A-Z/a-z/;
$name = 'QS_' . $name;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
eval "$$name = "$value"";
}
$QS_s = 1 if ($QS_s eq '');
$QS_n = 3600 if ($QS_n eq '');
if ($QS_f eq '') {
print "HTTP/1.0 200 OKn";
print "Content-type: text/htmlnn";
print "&b&ERROR&/b&: No file givenn";
exit(0);
}
if (! -f $QS_f) {
print "HTTP/1.0 200 OKn";
print "Content-type: text/htmlnn";
print "&b&ERROR&/b&: File $QS_f not foundn";
exit(0);
}

sub print_http_headers_multipart_begin {
print "HTTP/1.0 200 OKn";
$bound = "ThisRandomString12345";
print "Content-type: multipart/x-mixed-replace;boundary=$boundn";
&print_http_headers_multipart_next;
}

sub print_http_headers_multipart_next {
print "n--$boundn";
}

sub print_http_headers_multipart_end {
print "n--$bound--n";
}

sub displayhtml {
local($buffer) = @_;
$len = length($buffer);
print "Content-type: text/htmln";
print "Content-length: $lennn";
print $buffer;
}

sub readfile {
local($file) = @_;
local(*FP, $size, $buffer, $bytes);
($x, $x, $x, $x, $x, $x, $x, $size) = stat($file);
$size = sprintf("%d", $size);
open(FP, "&$file");
$bytes = sysread(FP, $buffer, $size);
close(FP);
return $buffer;
}

$buffer = &readfile($QS_f);
&print_http_headers_multipart_begin;
&displayhtml($buffer);

sub mystat {
local($file) = $_[0];
local($time);

($x, $x, $x, $x, $x, $x, $x, $x, $x, $mtime) = stat($file);
return $mtime;
}

$mtimeL = &mystat($QS_f);
$mtime = $mtime;
for ($n = 0; $n & $QS_n; $n++) {
while (1) {
$mtime = &mystat($QS_f);
if ($mtime ne $mtimeL) {
$mtimeL = $mtime;
sleep(2);
$buffer = &readfile($QS_f);
&print_http_headers_multipart_next;
&displayhtml($buffer);
sleep(5);
$mtimeL = &mystat($QS_f);
last;
}
sleep($QS_s);
}
}

&print_http_headers_multipart_end;

exit(0);

##EOF##
大型虚拟主机
说明:
Apache的功能很强，在有几十个虚拟主机的情况下运行得很好，但是如果你是ISP，需要提供几百个虚拟主机，那么这就不是一个最佳的选择了。

方案:
为此，需要用代理吞吐(Proxy Throughput)功能(flag [P])映射远程页面甚至整个远程网络区域到自己的名称空间：

代码:
##
## vhost.map
##
www.vhost1.dom:80 /path/to/docroot/vhost1
www.vhost2.dom:80 /path/to/docroot/vhost2
:
www.vhostN.dom:80 /path/to/docroot/vhostN

代码:
##
## httpd.conf
##
:
# use the canonical hostname on redirects, etc.
UseCanonicalName on

:
# add the virtual host in front of the CLF-format
CustomLog /path/to/access_log "%e %h %l %u %t "%r" %>s %b"
:

# enable the rewriting engine in the main server
RewriteEngine on

# define two maps: one for fixing the URL and one which defines
# the available virtual hosts with their corresponding
# DocumentRoot.
RewriteMap lowercase int:tolower
RewriteMap vhost txt:/path/to/vhost.map

# Now do the actual virtual host mapping
# via a huge and complicated single rule:
#
# 1. make sure we don't map for common locations
RewriteCond % !^/commonurl1/.*
RewriteCond % !^/commonurl2/.*
:
RewriteCond % !^/commonurlN/.*
#
# 2. make sure we have a Host header, because
# currently our approach only supports
# virtual hosting through this header
RewriteCond % !^$
#
# 3. lowercase the hostname
RewriteCond $|NONE} ^(.+)$
#
# 4. lookup this hostname in vhost.map and
# remember it only when it is a path
# (and not "NONE" from above)
RewriteCond $ ^(/.*)$
#
# 5. finally we can map the URL to its docroot location
# and remember the virtual host for logging puposes
RewriteRule ^/(.*)$ %1/$1 [E=VHOST:$}]
:

对访问的限制
阻止Robots
说明:
如何阻止一个完全匿名的robot取得特定网络区域的页面？一个/robots.txt文件可以包含若干"Robot Exclusion Protocol(robot排除协议)"的行，但不足以阻止此类robot。

方案:
可以用一个规则集以拒绝对网络区域/~quux/foo/arc/(对一个很深的目录区域进行列表可能会使服务器产生很大的负载)的访问。还必须确保仅阻止特定的robot，就是说，仅仅阻止robot访问主机是不够的，这样会同时也阻止了用户访问该主机。为此，就需要对HTTP头的User-Agent信息作匹配。

代码:
RewriteCond % ^NameOfBadRobot.*
RewriteCond % ^123.45.67.[8-9]$
RewriteRule ^/~quux/foo/arc/.+ - [F]
阻止内嵌的图片
说明:
假设，http://www.quux-corp.de/~quux/有一些内嵌图片的页面，这些图片很好，所以就有人用超链连到他们自己的页面中了。由于这样徒然增加了我们的服务器的流量，因此，我们不愿意这种事情发生。

方案:
虽然，我们不能100%地保护这些图片不被写入别人的页面，但至少可以对发出HTTP Referer头的浏览器加以限制。

代码:
RewriteCond % !^$
RewriteCond % !^http://www.quux-corp.de/~quux/.*$ [NC]
RewriteRule .*.gif$ - [F]

RewriteCond % !^$
RewriteCond % !.*/foo-with-gif.html$
RewriteRule ^inlined-in-foo.gif$ - [F]

对主机的拒绝
说明:
如何拒绝一批外部列表中的主机对我们服务器的使用？

方案:
代码:
For Apache >= 1.3b6:

RewriteEngine on
RewriteMap hosts-deny txt:/path/to/hosts.deny
RewriteCond $|NOT-FOUND} !=NOT-FOUND [OR]
RewriteCond $|NOT-FOUND} !=NOT-FOUND
RewriteRule ^/.* - [F]

For Apache <= 1.3b6:

RewriteEngine on
RewriteMap hosts-deny txt:/path/to/hosts.deny
RewriteRule ^/(.*)$ $|NOT-FOUND}/$1
RewriteRule !^NOT-FOUND/.* - [F]
RewriteRule ^NOT-FOUND/(.*)$ $|NOT-FOUND}/$1
RewriteRule !^NOT-FOUND/.* - [F]
RewriteRule ^NOT-FOUND/(.*)$ /$1

代码:
##
## hosts.deny
##
## ATTENTION! This is a map, not a list, even when we treat it as such.
## mod_rewrite parses it for key/value pairs, so at least a
## dummy value "-" must be present for each entry.
##

193.102.180.41 -
bsdti1.sdm.de -
192.76.162.40 -

对代理的拒绝
说明:
如何拒绝某个主机或者来自特定主机的用户使用Apache代理？

方案:
首先，要确保Apache网站服务器在编译时配置文件中mod_rewrite在mod_proxy的下面(!)，使它在mod_proxy之前被调用。然后，如下拒绝某个主机...

代码:
RewriteCond % ^badhost.mydomain.com$
RewriteRule !^http://[^/.].mydomain.com.* - [F]

...如下拒绝user@host-dependent:

代码:
RewriteCond %@% ^badguy@badhost.mydomain.com$
RewriteRule !^http://[^/.].mydomain.com.* - [F]

特殊的认证
说明:
有时候，会需要一种非常特殊的认证，即，对一组明确指定的用户，允许其访问，而没有(在使用mod_access的基本认证方法时可能会出现的)任何提示。

方案:
可是使用一个重写条件列表来排除所有的朋友：

代码:
RewriteCond %@% !^friend1@client1.quux-corp.com$
RewriteCond %@% !^friend2@client2.quux-corp.com$
RewriteCond %@% !^friend3@client3.quux-corp.com$
RewriteRule ^/~quux/only-for-friends/ - [F]

基于提交者(Referer)的反射器
说明:
如何配置一个基于HTTP头"Referer"的反射器以反射到任意数量的提交页面?

方案:
使用这个很技巧的规则集...

代码:
RewriteMap deflector txt:/path/to/deflector.map

RewriteCond % !=""
RewriteCond $} ^-$
RewriteRule ^.* % [R,L]

RewriteCond % !=""
RewriteCond $|NOT-FOUND} !=NOT-FOUND
RewriteRule ^.* $} [R,L]

... 并结合对应的重写地图:

代码:
##
## deflector.map
##

http://www.badguys.com/bad/index.html -
http://www.badguys.com/bad/index2.html -
http://www.badguys.com/bad/index3.html http://somewhere.com/

它可以自动将请求(在地图中指定了"-"值的时候)反射回其提交页面，或者(在地图中URL有第二个参数时)反射到一个特定的URL。

其他
外部重写引擎
说明:
一个常见的问题: 如何解决似乎无法用mod_rewrite解决的FOO/BAR/QUUX/之类的问题？

方案:
可以使用一个与RewriteMap功能相同的外部RewriteMap程序，一旦它在Apache启动时被执行，则从STDIN接收被请求的URL，并将处理过(通常是重写过的)的URL(以相同顺序!)在STDOUT输出。

代码:
RewriteEngine on
RewriteMap quux-map prg:/path/to/map.quux.pl
RewriteRule ^/~quux/(.*)$ /~quux/$

代码:
#!/path/to/perl

# disable buffered I/O which would lead
# to deadloops for the Apache server
$| = 1;

# read URLs one per line from stdin and
# generate substitution URL on stdout
while (<>) {
s|^foo/|bar/|;
print $_;
}

这是一个作演示的例子，只是把所有的URL /~quux/foo/...重写为/~quux/bar/...，而事实上，可以把它修改以获得任何你需要的功能。但是要注意，虽然一般用户都可以使用，可是只有系统管理员才可以定义这样的地图。

Currently Active Users Viewing This Thread: 2 (0 members and 2 guests)