Author Archives: btwotch

compile_commands.json – independent from cmake

For another project I need a so called “compile_commands.json”.

So, what is that? If you compile software from source, it is very likely that you have more than one source file; your build system (e.g. cmake, autotools) will take care about it and create some way to build the project files in the right order with the right libraries and include paths and so on set.

If you are using cmake, creating such a file is pretty easy:

$ build/ > cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=1 ..

More information about cmake and compile_commands: https://cmake.org/cmake/help/latest/variable/CMAKE_EXPORT_COMPILE_COMMANDS.html

But what about if you are not using cmake? Then you can use my project: https://github.com/btwotch/compile_commands

How does it work? My first try was to do a simple trick:

  1. Create an executable that is called gcc, logs how it is invoked and then calls the real gcc
  2. Put a link to this executable into a temporary directory
  3. Prepend this temporary directory to the $PATH environment variable

I thought this is *the* solution. Unfortunately I was wrong. I tried it out with a local checkout of a random project that is using cmake to test it.

cmake configured the project Makefile in a way that the absolute path to the compiler is used. Then of course $PATH is not considered anymore and my symlink is evaded.

The solution!

How can I as a user with no root priviledges prevent a process to start the compiler directly?

First, some ideas I didn’t follow up:

  • Creating an LD_PRELOAD library that is hooking into exec calls and logs the actual call. Disadvantage: it does not work on statically linked systems
  • Using ptrace to intercept the actual exec kernel syscall and log that call. This also works on statically linked systems, but ptrace is slow and the implementation is not that easy.

So, what did I do instead? Containers! Or actually: Namespaces!

  • Create a temporary directory for the compiler binaries
  • Create a new mount namespace (more information about those: http://man7.org/linux/man-pages/man7/mount_namespaces.7.html)
  • Bind mount the compiler binaries to the temporary directory; the advantage over a hard link is, that this works cross-filesystems.
  • Bind mount my binary that invokes the compiler from the temporary directory and creates the log over the original path of the compiler (e.g. /usr/bin/gcc)
unshare(CLONE_NEWNS|CLONE_NEWUSER|CLONE_NEWPID);
if (fork()) {   
        int status = -1;
        wait(&status);  
        exit(status);   
}               
mount("none", "/", NULL, MS_REC|MS_PRIVATE, NULL);
mount("none", "/proc", NULL, MS_REC|MS_PRIVATE, NULL);
char mappingBuf[512];
int setgroupFd = open( "/proc/self/setgroups", O_WRONLY);
write(setgroupFd, "deny", 4);
close(setgroupFd);
int uid_mapFd = open("/proc/self/uid_map", O_WRONLY);
snprintf(mappingBuf, 512, "0 %d 1", uid); 
write(uid_mapFd, mappingBuf, strlen(mappingBuf));
close(uid_mapFd);
int gid_mapFd = open("/proc/self/gid_map", O_WRONLY);
snprintf(mappingBuf, 512, "0 %d 1", gid); 
write(uid_mapFd, mappingBuf, strlen(mappingBuf));
close(gid_mapFd);
for (const auto &compilerBin : compilerInvocations) {
        fs::path origPath = getOriginalPath(compilerBin);
        if (origPath.empty()) {
                continue;       
        }               
        bindMount(origPath, binDir.path() / compilerBin); 
}               
for (const auto &compilerBin : compilerInvocations) {
        fs::path origPath = getOriginalPath(compilerBin);
        if (origPath.empty()) {
                continue;       
        }               
        bindMount(fs::canonical(fs::path{"ec"}), origPath);
}               

“unshare” creates the new mount namespace – we also have to join a new user namespace to have root permissions in this container. To enter the new namespace we have to fork().

Then some basic filesystems are mounted (“/” and “/proc”).

In order to be allowed to map outside user and group ids into the namespace, we have to deny setting setgroups.

The next two steps are to set the user id mapping and the group id mapping by writing into /proc/self/uid_map, respectively into /proc/self/gid_map

In the following step all compiler binaries (gcc, g++, clang, clang++, c++, …) are bind mounted into a temporary directory.

Then the “ec” program is bind mounted over the original paths for compilers.

After that the process, e.g. “make” can be created.

Now all compiler invocations will be logged and in the end the log will be converted to compile_commands.json.

That’s it!

Windows Product Key Kernel Module

For fun, I created a simple kernel module for linux to extract the windows product key from the acpi table msdm (acpidump is your friend) and create a sysfs object to read it.

Usage:

cat /sys/devices/platform/windows_product_key/key

> XXXX-XXXX-XXXX-XXXX

 

 

Code: https://github.com/btwotch/windows_product_key

c2sqlite

Do you use vim for coding? Isn’t it sometimes annoying that you don’t have all the fancy features that some IDEs have?
c2sqlite uses libclang to store basic information of your c project into a sqlite database. Current information that is stored:

  • which function calls which function
  • declaration of functions
  • function parameters

Example:
./c2sqlite c2sqlite.c
This creates a file called test.db; it is a sqlite database.
You can open it with: sqlite3 test.db
Enable information about columns: .explain
Show tables: .tables
Following tables are available:

  • function_calling
  • function_declaration
  • function_param

Now show alle functions that are declared in c2sqlite.c:
SELECT * FROM function_declaration WHERE file='c2sqlite.c';

name file line col
------------------------------ ------------- ---- ----
db_open c2sqlite.c 19 1
db_close c2sqlite.c 48 1
db_begin c2sqlite.c 54 1
db_end c2sqlite.c 64 1
db_add_funcparam c2sqlite.c 74 1
db_add_funccall c2sqlite.c 97 1
db_add_funcdecl c2sqlite.c 120 1
functionDeclVisitor c2sqlite.c 149 1
cursorVisitor c2sqlite.c 169 1
main c2sqlite.c 197 1


Besides the name of the function you find some information about the location (line, column), too.
Next table: function_calling
Lets look up which functions main is calling: SELECT * FROM function_calling WHERE caller='main';

caller callee file line col
-------------------- -------------------- -------------------- -------------------- ----
main unlink c2sqlite.c 203 2
main db_open c2sqlite.c 204 11
main db_begin c2sqlite.c 207 2
main clang_createIndex c2sqlite.c 209 10
main clang_parseTranslationUnit c2sqlite.c 212 8
main fprintf c2sqlite.c 215 4
main exit c2sqlite.c 216 4
main clang_getTranslationUnitCursor c2sqlite.c 219 25
main clang_visitChildren c2sqlite.c 221 3
main clang_disposeTranslationUnit c2sqlite.c 223 3
main clang_disposeIndex c2sqlite.c 226 2
main db_end c2sqlite.c 228 2
main db_close c2sqlite.c 229 2

Last but not least: function_param
SELECT * FROM function_param WHERE function='main';
This shows us which parameters the main function has:

function name type id
-------------------- -------------------- -------------------- --------------------
main argc int 17
main argv const char *[] 114

Have a lot of fun!

Backup: GReader2Sqlite

Backup your google reader into a sqlite database; moreover all used xml files (received via the API) are also stored. You should use this script in an empty directory!

#!/usr/bin/perl -w

#--------Enter your credentials here-----------------
my $user= 'sjobs@gmail.com';
my $pwd = 'iPassword';



#----------------------------------------------------
use strict;
use LWP;
use XML::Simple;
use LWP::UserAgent;
use Data::Dumper;
use DBI;

my $dbh; 
my $xml;
my $st; 
my $count = 0;
my $ua = LWP::UserAgent->new;;
$ua->agent("GReader Export ");

my $req = HTTP::Request->new(GET => 'https://www.google.com/accounts/ClientLogin?service=reader&Email='.$user.'&Passwd='.$pwd);
my $res = $ua->request($req);

die $res->status_line, "\n" if (not $res->is_success);
$res->content =~ m/Auth=(\S*)/;
my $auth = $1;


$req = HTTP::Request->new(GET => 'http://www.google.com/reader/api/0/subscription/list?output=xml');
$req->header(Authorization => 'GoogleLogin auth='.$auth);
$res = $ua->request($req);
die $res->status_line, "\n" if (not $res->is_success);


open (FILEHANDLE, ">subscriptions.xml");
print FILEHANDLE $res->content;
close FILEHANDLE;

$dbh = DBI->connect("dbi:SQLite:dbname=greader.db",{AutoCommit => 0});
$dbh->do('CREATE TABLE IF NOT EXISTS subscriptions (id VARCHAR PRIMARY KEY, title VARCHAR, htmlUrl VARCHAR, file VARCHAR)');
$dbh->do('CREATE TABLE IF NOT EXISTS feeds (id VARCHAR REFERENCES subscriptions(id), title VARCHAR, link VARCHAR, label VARCHAR, content VARCHAR)');
$st = $dbh->prepare("INSERT OR REPLACE INTO subscriptions (id, title, htmlUrl, file) VALUES (?, ?, ?, ?)");

my $xs = XML::Simple->new;
$xml = $xs->XMLin($res->content, KeyAttr => {});

my $id;
my $title;
my $htmlurl;
my $label;
my $content;
my $link;
my $file;
foreach my $object (values $xml->{list}->{object}) {
	foreach my $string (values $object->{string}) {
		$id = $string->{content} if ($string->{name} eq 'id');
		$title = $string->{content} if ($string->{name} eq 'title');
		$htmlurl = $string->{content} if ($string->{name} eq 'htmlUrl');
	}
	$file = substr($htmlurl, 0, 30);
	$file =~ s/[^0-9a-z_-]+/_/gi;
	$file = $file."-".$count.".xml";
	$st->execute($id, $title, $htmlurl, $file);
	$count++;
}


$st = $dbh->prepare("SELECT id, file FROM subscriptions");
my $st_insert = $dbh->prepare("INSERT OR REPLACE INTO feeds (id, title, link, label, content) VALUES (?, ?, ?, ?, ?)");
$st->execute();
while ((my $row = $st->fetchrow_hashref())) {
	print $count--." ".$row->{id}."\n";
	$req = HTTP::Request->new(GET => 'http://www.google.com/reader/atom/'.$row->{id});
	$req->header(Authorization => 'GoogleLogin auth='.$auth);
	$res = $ua->request($req);
	next if (not $res->is_success);

	open (FILEHANDLE, ">".$row->{file});
	print FILEHANDLE $res->content;
	close FILEHANDLE;

	$xml = $xs->XMLin($res->content, KeyAttr => {});
	foreach my $entry (values $xml->{entry}) {

		open (FILEHANDLE, ">".$row->{file}.".dump");
		print FILEHANDLE Dumper($xml);
		close FILEHANDLE;

		$title = ""; $link = ""; $label = ""; $content = "";
		if (not ref($entry) eq 'HASH') {
			next;
		}
		if (ref($entry->{title}) eq 'HASH') {
			$title = $entry->{title}->{content} if (defined $entry->{title}->{content});
		} else {
			$title = $entry->{title} if (defined $entry->{title});
		}
		if(ref($entry->{link}) eq 'ARRAY'){
			$link = $entry->{link}[0]->{href} if (defined $entry->{link}[0]->{href});
		} else {
			$link = $entry->{link}->{href} if (defined $entry->{link}->{href});
		}
		if (defined $entry->{category}) {
			foreach my $l (values $entry->{category}) {
				$label = $label.$l->{label}." " if (ref($l) eq 'HASH' and defined $l->{label});
				chomp $label;
			}
		}
		if (ref($entry->{content}) eq 'HASH') {
			$content = $entry->{content}->{content} if (defined $entry->{content}->{content});
		} else {
			$content = $entry->{content} if (defined $entry->{content});
		}
		chomp $title; chomp $link; chomp $label; chomp $content;
		$st_insert->execute($row->{id}, $title, $link, $label, $content);
	}
}

web2img

It is really easy to save a website into a png using webkit and qt.
Alternatives:

  • wkhtmltopdf
  • webkit2pdf

My code:
Web2IMG.hpp

#include <QtWebKit>

class Web2IMG : public QWebPage {
        Q_OBJECT

        public:
                void print(QUrl &url, QString &ilename);
        private:
                QString filename;
        private slots:
                void loaded();


};

Web2IMG.cpp

#include <QApplication>
#include <QtWebKit>
#include <QtGui>
#include <QSvgGenerator>
#include <QPrinter>
#include <QTimer>
#include <QByteArray>
#include <QNetworkRequest>

#include "Web2IMG.hpp"

void Web2IMG::loaded()
{
        QImage image(mainFrame()->contentsSize(), QImage::Format_ARGB32);
        QPainter painter(&image);

        setViewportSize(mainFrame()->contentsSize());

        mainFrame()->render(&painter);

        painter.end();

        image.save(filename);

        QCoreApplication::exit();
}

void Web2IMG::print(QUrl &url, QString &filename)
{
        QNetworkRequest req;

        this->filename = filename;
        req.setUrl(url);

        mainFrame()->load(req, QNetworkAccessManager::GetOperation);

        connect(this, SIGNAL(loadFinished(bool)), this, SLOT(loaded()));
}


int main(int argc, char **argv)
{
        if (argc != 3)
        {
                qDebug() << "usage: web2img <url> <filename>";
                return -1;
        }

        QUrl url = QUrl::fromEncoded(argv[1]);
        QString file = argv[2];
        QApplication app(argc, argv, true);

        class Web2IMG w2i;

        w2i.print(url, file);

        return app.exec();
}


web2img.pro

QT       +=  webkit svg network
SOURCES   =  Web2IMG.cpp
HEADERS   =  Web2IMG.hpp
CONFIG   +=  qt console

contains(CONFIG, static): {
  QTPLUGIN += qjpeg qgif qsvg qmng qico qtiff
  DEFINES  += STATIC_PLUGINS
}


Build:

  1. qmake-qt4
  2. make

FM4 podcast downloader (update)

As FM4 changed their podcast infrastructure, the old downloader doesn’t work anymore; here is a little update:

#!/usr/bin/perl -w

use strict;
use LWP::Simple;
use JSON;

my $json = JSON->new;
my $prefix = "http://loopstream01.apa.at/?channel=fm4&ua=ipad&id=";

my $urls = [
        "http://audioapi.orf.at/fm4/json/2.0/playlist/4ULMon", 
        "http://audioapi.orf.at/fm4/json/2.0/playlist/4ULTue",
        "http://audioapi.orf.at/fm4/json/2.0/playlist/4ULWed",
        "http://audioapi.orf.at/fm4/json/2.0/playlist/4ULThu",
        "http://audioapi.orf.at/fm4/json/2.0/playlist/4ULFri",
        ];


foreach my $url (values $urls) {
        my $content = get $url or next;

        my $data = $json->decode($content);

        my $filename = $data->{"streams"}[0]->{'loopStreamId'};
        my $mp3 = $prefix.$filename;
        print $mp3." -> ".$filename."\n";
        getstore ($mp3, $filename);
}

Now they use json instead of xml and it’s not possible to continue a download anymore.

a simple example for a calltracer tool with valgrind

  1. Prepare the valgrind source directory as described here: http://www.valgrind.org/docs/manual/writing-tools.html#writing-tools.gettingstarted
  2. Open your “??_main.c” file, where “??” is your prefix you’ve chosen. Be aware that you can’t just use libc functions, you have to use the functions provided by valgrind!
  3. Create the “pre_clo_init” function:
    static void ??_pre_clo_init(void)
    {
            VG_(details_name)            ("Calltrace");
            VG_(details_version)         (NULL);
            VG_(details_description)     ("The best calltracer");
            VG_(details_copyright_author)(
                            "Copyright (C) 2002-2012, and GNU GPL'd, by btwotch!");
            VG_(details_bug_reports_to)  (VG_BUGS_TO);
    
            VG_(details_avg_translation_sizeB) ( 275 );
    
            VG_(basic_tool_funcs)        (??_post_clo_init, ??_instrument, ??_fini);
    
            /* No needs, no core events to track */
    }
    VG_DETERMINE_INTERFACE_VERSION(??_pre_clo_init)
    
                
  4. static void ??_post_clo_init(void)
    {
    }
    

    As I don’t need to initialize anything I kept that function empty.

  5. static void ??_fini(Int exitcode)
    {
            VG_(printf)("bye\n");
    }
    

    No cleanup is required.

  6. Now the most interesting function:
    static IRSB* ??_instrument ( VgCallbackClosure* closure, IRSB* bb, VexGuestLayout* layout, VexGuestExtents* vge, IRType gWordTy, IRType hWordTy )
    {
            int i, j;
            UInt nips;
            IRStmt *st;
            Addr ips[VG_(clo_backtrace_size)];
            Addr sps[VG_(clo_backtrace_size)];
            Addr fps[VG_(clo_backtrace_size)];
            Addr x;
    
            for (i = 0; i <  bb->stmts_used; i++)
            {
                    st = bb->stmts[i];
                    if (st->tag == Ist_IMark)
                    {
                            nips = VG_(get_StackTrace)(VG_(get_running_tid)(), ips, VG_(clo_backtrace_size), sps, fps, 0);
                            for (j = 0; j < nips; j++)
                            {
                                    if (j > 0)
                                            VG_(printf)("\t>");
                                    print_fn(ips[j]);
                            }
                            VG_(printf)(">> \n");
                    }
            }
            return bb;
    }
    
    • “IRSB* bb” is the basic block; it contains “stmts_used” statements.
    • “Ist_IMark” is a marker for information about the basic block; it contains the instruction address
    • With VG_(get_StackTrace) we get a stacktrace with the following information:
      • “ips”: address of the instruction
      • “sps”: address of the stack pointer
      • “fps”: address of the frame pointer
    • Then we iterate over the addresses of the stacktrace
  7. static void print_fn(Addr a)
    {
            Bool named = False;
            UInt linenum;
            ThreadId tid;
            Bool dirname_available;
            char filename[1024], dirname[1024], fnname[1024];
    
            named = VG_(get_fnname)(a, fnname, 1024);
            tid = VG_(get_running_tid)();
            VG_(printf)("tid: %u|addr: %p|fnname: %s", tid, (void*)a, (named == True) ? fnname : "");
            if (VG_(get_filename_linenum)(a, filename, 1024, dirname, 1024, &dirname_available, &linenum) == True)
                    VG_(printf)("|file: %s|dir: %s|line: %u", filename, dirname, linenum);
            VG_(printf)("\n");
    }
    
    • This function looks up the name of the function via “VG_(get_fnname)” and tries to get additional information about
      the filename, linenumber and directory where that function is defined with “VG_(get_filename_linenum)”. The it prints that
      information.

Now a complete example:



/*


   Copyright (C) 2002-2012 btwotch
      btwotch+vallgrindcalltrace@gmail.org

   This program is free software; you can redistribute it and/or
   modify it under the terms of the GNU General Public License as
   published by the Free Software Foundation; either version 2 of the
   License, or (at your option) any later version.

   This program is distributed in the hope that it will be useful, but
   WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
   02111-1307, USA.

   The GNU General Public License is contained in the file COPYING.
*/

#include "pub_tool_basics.h"
#include "pub_tool_tooliface.h"
#include "pub_tool_debuginfo.h"
#include "pub_tool_libcprint.h"
#include "pub_tool_threadstate.h"
#include "pub_tool_options.h"
#include "pub_tool_stacktrace.h"
#include "pub_tool_vki.h"
#include "pub_tool_libcfile.h"


static int fd;

static void ct_post_clo_init(void)
{
}

static UInt poor_fprintf(HChar* format, ...)
{
  UInt ret;
  va_list vargs;
  char buf[8192];

  if(fd < 0)
    VG_(printf)("fail, could not open tracefile\n");

  va_start(vargs, format);
  ret = VG_(vsnprintf)(buf, 8192, format, vargs);

  //VG_(printf)("%s", buf);

  VG_(write)(fd, buf, ret);

  va_end(vargs);  
  
  return ret;
}


// TODO: Future idea: convert address in tracepath
static void print_fn(Addr a)
{
  Bool named = False;
  UInt linenum;
  ThreadId tid;
  Bool dirname_available;
  char filename[1024], dirname[1024], fnname[1024];

  named = VG_(get_fnname)(a, fnname, 1024);
  tid = VG_(get_running_tid)();
  poor_fprintf("tid: %u|addr: %p|fnname: %s", tid, (void*)a, (named == True) ? fnname : "");
  if (VG_(get_filename_linenum)(a, filename, 1024, dirname, 1024, &dirname_available, &linenum) == True)
    poor_fprintf("|file: %s|dir: %s|line: %u", filename, dirname, linenum);
  poor_fprintf("\n");
}

static IRSB* ct_instrument ( VgCallbackClosure* closure, IRSB* bb, VexGuestLayout* layout, VexGuestExtents* vge, IRType gWordTy, IRType hWordTy )
{
  int i, j;
  UInt nips;
  IRStmt *st;
  Addr ips[VG_(clo_backtrace_size)];
  Addr sps[VG_(clo_backtrace_size)];
  Addr fps[VG_(clo_backtrace_size)];
  Addr x;

  for (i = 0; i <  bb->stmts_used; i++)
  {
    st = bb->stmts[i];
    if (st->tag == Ist_IMark)
    {
      nips = VG_(get_StackTrace)(VG_(get_running_tid)(), ips, VG_(clo_backtrace_size), sps, fps, 0);
      for (j = 0; j < nips; j++)
      {
        if (j > 0)
          poor_fprintf("\t>");
        print_fn(ips[j]);
      }
      poor_fprintf(">> \n");
      poor_fprintf("\tsps: %p fps: %p\n", (void*)sps[0], (void*)fps[0]);
      for (x = sps[0]; x < fps[0] && x <= sps[0]+0x32; x+=4)
        poor_fprintf("\t%p: %x\n", (void*)x, *(int*)x);
      poor_fprintf("\n<<\n");
    }
  }


  return bb;
}

static void ct_fini(Int exitcode)
{
  VG_(close)(fd);

  VG_(printf)("log written to trace.log\n");
}

static void ct_pre_clo_init(void)
{
  VG_(details_name)            ("Calltrace");
  VG_(details_version)         (NULL);
  VG_(details_description)     ("The best calltracer");
  VG_(details_copyright_author)(
      "Copyright (C) 2002-2012, and GNU GPL'd, by btwotch!");
  VG_(details_bug_reports_to)  (VG_BUGS_TO);

  VG_(details_avg_translation_sizeB) ( 275 );


  fd =  VG_(fd_open)("trace.log", VKI_O_WRONLY|VKI_O_CREAT, VKI_S_IRUSR|VKI_S_IWUSR);

  VG_(basic_tool_funcs)        (ct_post_clo_init, ct_instrument, ct_fini);

  /* No needs, no core events to track */
}

VG_DETERMINE_INTERFACE_VERSION(ct_pre_clo_init)

This code writes the information (including a stackdump) into a file called “trace.log”. It uses

  • “VG_(fd_open)(“trace.log”, VKI_O_WRONLY|VKI_O_CREAT, VKI_S_IRUSR|VKI_S_IWUSR)” to open the file
  • “VG_(close)(fd)”: to close the file
  • “VG_(write)”: to write into that file and “poor_fprintf” to write formatted into the file

Introducing ppi – linux process progress information

Problem: you copy a file (with cp) and you don’t know how much it yet has progressed.

Solution: ppi – process progress information

How it works:

  1. lookup filedescriptors of process pid in /proc/pid/fd/
  2. lookup fdinfo of filedescriptor in /proc/pid/fdinfo/fd
    • if process writes only to file, filedescriptor is not interesting
    • if position of filedescriptor is 0, filedescriptor is not interesting
    • if position of filedescriptor is the size of the file, filedescriptor is not interesting

Code:

#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <dirent.h>


struct fdinfo
{
    char *path;
    size_t size;
    size_t pos;

};


int termwidth()
{
    struct winsize w;

    if (ioctl(STDOUT_FILENO, TIOCGWINSZ, &w) == -1)
        return 20; // default value
    
    return w.ws_col;
}


void print_percent_progress(char *txt, int p)
{
    int i, width = termwidth() - 5;
    int begin = strlen(txt)+3;

    printf("%s  ", txt);
    for (i = begin; i < width; i++)
    {
        if (i == begin)
        {
            printf("|");
            continue;
        }
        if (i == width-1)
        {
            printf("|");
            continue;
        }
        if ((100*(i-begin)/width) < p)
        {
            if ((100*(i+1-begin)/width) < p)
                printf("-");
            else
                printf(">");
        }
        else
            printf(" ");
    }
    if (p > 0 && p < 100)
        printf(" %d%%", p);

    fflush(stdout);
}

void pos1()
{
    printf("\x1b[1G");
}

void print_help()
{
    printf("usage: ppi <pid>\n");
}

char* link_dereference(char *path)
{
    char *ret;
    struct stat sb;

    if (lstat(path, &sb) == -1)
        return NULL;

    ret = calloc(1, sb.st_size+1);

    if (readlink(path, ret, sb.st_size+1) < 0)
    {
        free(ret);
        return NULL;
    }

    ret[sb.st_size] = '\0';

    return ret;
}

size_t file_size(char *path)
{
    struct stat sb;

    stat(path, &sb);

    return sb.st_size;
}

// return NULL if fd is not interesting
struct fdinfo* get_fdinfo(pid_t pid, int fd)
{
    int fdinfo_path_len = strlen("/proc/XXXXX/fdinfo/XXXX"), flags = -1;
    size_t pos = -1, size;
    char fdinfo_path[fdinfo_path_len+1]; // max 9999 fds!!
    char line[1024], *path;
    FILE *fp;
    struct fdinfo* ret;

    snprintf(fdinfo_path, fdinfo_path_len+1, "/proc/%d/fdinfo/%d", pid, fd);

    fp = fopen(fdinfo_path, "r");
    if (fp == NULL)
    {
        fprintf(stderr, "FATAL: could not open %s\n", fdinfo_path);
        return NULL;
    }

    while (!feof(fp))
    {
        fgets(line, 1024, fp);
        if (!strncmp(line, "pos:\t", 5))
            pos = atol(line+5);
        else if (!strncmp(line, "flags:\t", 7))
            flags = atoi(line+7);

        if (flags != -1 && pos != -1)
            break;
    }    

    if (flags & O_WRONLY) // only reads are interesting
        return NULL;

    snprintf(fdinfo_path, fdinfo_path_len+1, "/proc/%d/fd/%d", pid, fd);
    path = link_dereference(fdinfo_path);

    if (path == NULL)
    {
        fprintf(stderr, "FATAL: could not dereference %s\n", fdinfo_path);
        return NULL;
    }
    size = file_size(path);

    if (pos == size || pos == 0 || size == 0) // not interesting
    {
        free(path);
        return NULL;
    }

    ret = malloc(sizeof(struct fdinfo));
    ret->path = path;
    ret->size = size;
    ret->pos = pos;

    return ret;
}

void dump_fdinfo(struct fdinfo* fi)
{
    printf("path: %s size: %ld pos: %ld\n", fi->path, fi->size, fi->pos);

}

void enumerate_fds(pid_t pid)
{
    int proc_fds_len = strlen("/proc/XXXXX/fd"), fd_path_len, fd;
    char proc_fds[proc_fds_len+1];
    char *fd_path = NULL;
    DIR *d;
    struct dirent *de;
    struct stat st;
    struct fdinfo* fi;

    snprintf(proc_fds, proc_fds_len+1, "/proc/%d/fd", pid);

    d = opendir(proc_fds);
    if (d == NULL)
    {
        fprintf(stderr, "FATAL: could not open %s\n", proc_fds);
        return;
    }

    while ((de = readdir(d)) != NULL)
    {
        fd_path_len = strlen(de->d_name)+proc_fds_len+2;
        fd_path = realloc(fd_path, fd_path_len);
        snprintf(fd_path, fd_path_len, "/proc/%d/fd/%s", pid, de->d_name);
        stat(fd_path, &st);
        if (S_ISREG(st.st_mode))
        {
            fd = atoi(de->d_name);
            fi = get_fdinfo(pid, fd);
            if (fi != NULL)
            {
                print_percent_progress(fi->path, (100*fi->pos)/fi->size);
                printf("\n");
                //dump_fdinfo(fi);

                free(fi->path);
                free(fi);
            }

        }
        
        
    }

    if (fd_path != NULL)
        free(fd_path);

    closedir(d);

}


int main(int argc, char **argv)
{
    pid_t pid;

    if (argc != 2)
    {
        print_help();
        exit(1);
    }
    
    pid = atoi(argv[1]);

    enumerate_fds(pid);

    return 0;
}

Tested with:

  • mplayer
  • cp
  • md5sum
  • xz

Website downloader

This is a newer version (especially the source code) of the former blog entry at spin.
How does it work?

  • use libxml to parse html
  • scan css files for image urls
  • sorry, but no javascript :(
  • correct relative and absolute urls
  • download all these files

Le sourcecode:

#include
#include
#include
#include
#include <libxml/parser.h>
#include <libxml/HTMLparser.h>
#include <libxml/xmlerror.h>
#include <curl/curl.h>
#include

#include "getpage.h"

#define FILELENGTH 150
#define CURL_TIMEOUT_SEC 240
#define SELECT_TIMEOUT_SEC 10
#define MAX_P_FILE_DOWNLOADS 10

#define DEBUG

static char ALPHABET[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890";

/*
Get out:
  * TODO: frames
  * TODO: javascript file riddles ;)
  * TODO: wrong content-length -> reload
  * TODO: same file in 2 different CSS files

*/

enum FILETYPE
{
  IMG = 0x1,
  STYLE = 0x2,
  SCRIPT = 0x4,
  IFRAME = 0x8,
  FRAME = 0x10,
  PDF = 0x20,
  CSS_IMG = 0x40,
  NONE = 0x80
};

struct _replace_info
{
        char *begin;
        char end;

        void (*userfunction) (void*, char*, int, bool);
  void *userdata;

        char *buffer;
        int begin_progress;
        int begin_length;
        bool inside_gap;
        int status;
};

struct _site_files
{
  char *url;
  char *url2;
  char *filename;
  enum FILETYPE ft;
  struct _site_files *next;
  FILE *fp;
  struct _replace_info *ri;
  short nth_url;
#ifdef DEBUG
  int id;
  bool done;
#endif
};

struct _site_userdata
{
        //void (*site_function)(void*, const char*, ...);
        void (*site_function)(void*, const char*, va_list);
        void *userdata;

  struct _site_files *sf;
  char *_base_url;
  bool _utf8_meta_set;
  CURL *_mhnd;
  CURL *_hnd;
};

struct _css_filter_userdata
{
  struct _site_userdata *su;
  char *url;
};

struct _css_filter_save_userdata
{
  struct _site_userdata *su;
  FILE *fp;
  char *filename;
  char *url;
  char *_css_base_url;
};

static char *_filetype_string(enum FILETYPE ft)
{
  char *txt;
  switch(ft)
  {
    case IMG: txt = "IMG"; break;
    case CSS_IMG: txt = "CSS_IMG"; break;
    case STYLE: txt = "STYLE"; break;
    case SCRIPT:  txt = "SCRIPT"; break;
    case IFRAME:  txt = "IFRAME"; break;
    case FRAME: txt = "FRAME"; break;
    case PDF:  txt = "PDF"; break;
    case NONE: txt =  "OTHER"; break;
    default: txt = "DEFAULT"; break;
  }

  return txt;
}

static void _user_function(struct _site_userdata *su, const char *fmt, ...)
{
  va_list ap;
  va_start(ap, fmt);
  su->site_function(su->userdata, fmt, ap);
  va_end(ap);
}

static char *__join_together(char *a, char *b, int len_b)
{
        int len_a = 0;
        int i;
        char *new;

        if (a != NULL)
                len_a += strlen(a);

        new = realloc(a, len_b+1+len_a);

        if (new != NULL)
        {
                for (i = 0; i < len_b; i++)                         new[i+len_a] = b[i];         new[len_a+len_b] = '\0';         }         return new; } // return true if inside gap -> 1
// return false if outside gap -> -1
static int inline replace_step(struct _replace_info *ri, char txt)
{
        if (txt == ri->begin[ri->begin_progress])
                ri->begin_progress++;
  else
    ri->begin_progress = 0;

        if (ri->begin_progress == ri->begin_length)
        {
                ri->begin_progress = 0;
                ri->inside_gap = true;
                return -1;
        }

        if (ri->inside_gap)
        {
                if (txt == ri->end)
                {
                        ri->inside_gap = false;
                        return -1;
                }
                else
                        return 1;
        }

        return -1;
}

static void replace(struct _replace_info *ri, char *txt, int length)
{
        int i;
        int offset = 0;
        int status_temp = -1;

        for (i = 0; i < length; i++)         {                 status_temp = replace_step(ri, txt[i]);                 if (ri->status != status_temp)
                {
                        if (ri->buffer != NULL)
                        {
                                if (ri->status == 1)
                                {
                                        ri->userfunction(ri->userdata, ri->buffer, strlen(ri->buffer), true);
                                }
                                else if (ri->status == -1)
                                {
                                        ri->userfunction(ri->userdata, ri->buffer, strlen(ri->buffer), false);
                                }
                                free(ri->buffer);
                                ri->buffer = NULL;
                        }

                        if (ri->status == 1)
                        {
                                ri->userfunction(ri->userdata, txt+offset, i-offset, true);
                        }
                        else if (ri->status == -1)
                        {
                                ri->userfunction(ri->userdata, txt+offset, i-offset, false);
                        }

                        offset = i;
                }
                ri->status = status_temp;
        }

  if (offset != length)
        {
                if (status_temp == 1 || status_temp == -1)
                {
                        ri->userfunction(ri->userdata, txt+offset, i-offset, ri->status == 1 ? true : false);
                }
                else
                {
                        if (txt[length-1] == '\0')
                        {
                                if (ri->buffer != NULL)
                                {
                                        ri->userfunction(ri->userdata, ri->buffer, strlen(ri->buffer), false);
                                }
                                free(ri->buffer);
                                ri->buffer = NULL;
                                ri->userfunction(ri->userdata, txt+offset, length-offset, false);
                        }
                        else
                                ri->buffer = __join_together(ri->buffer, txt+offset, length-offset);
                }
        }

}

static void _set_chnd(CURL *hnd, char *url, void *cbfunction, void *userdata)
{
  curl_easy_setopt(hnd, CURLOPT_INFILESIZE_LARGE, (curl_off_t)-1);
  curl_easy_setopt(hnd, CURLOPT_URL, url);
  curl_easy_setopt(hnd, CURLOPT_NOPROGRESS, 1);
  curl_easy_setopt(hnd, CURLOPT_FAILONERROR, 0);
  curl_easy_setopt(hnd, CURLOPT_USERAGENT, "libmessage - btwotch+libmessage@gmail.com");
  //curl_easy_setopt(hnd, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.62 Safari/534.3");
  curl_easy_setopt(hnd, CURLOPT_RESUME_FROM_LARGE, (curl_off_t)0);
  curl_easy_setopt(hnd, CURLOPT_MAXREDIRS, 50);
  curl_easy_setopt(hnd, CURLOPT_SSLVERSION, 0);
  curl_easy_setopt(hnd, CURLOPT_TIMECONDITION, 0);
  curl_easy_setopt(hnd, CURLOPT_TIMEVALUE, 0);
  curl_easy_setopt(hnd, CURLOPT_CUSTOMREQUEST, NULL);
  curl_easy_setopt(hnd, CURLOPT_CONNECTTIMEOUT, CURL_TIMEOUT_SEC);
  curl_easy_setopt(hnd, CURLOPT_TIMEOUT, CURL_TIMEOUT_SEC);
  curl_easy_setopt(hnd, CURLOPT_HTTPAUTH, 1);
  curl_easy_setopt(hnd, CURLOPT_ENCODING, NULL);
  curl_easy_setopt(hnd, CURLOPT_IPRESOLVE, 0);
  curl_easy_setopt(hnd, CURLOPT_IGNORE_CONTENT_LENGTH, 0);
  curl_easy_setopt(hnd, CURLOPT_POSTREDIR, 0);
  curl_easy_setopt(hnd, CURLOPT_WRITEFUNCTION, cbfunction);
  curl_easy_setopt(hnd, CURLOPT_WRITEDATA, userdata);
  curl_easy_setopt(hnd, CURLOPT_FOLLOWLOCATION, 1);
  curl_easy_setopt(hnd, CURLOPT_NOSIGNAL, 1);
  curl_easy_setopt(hnd, CURLOPT_AUTOREFERER, 1);
  curl_easy_setopt(hnd, CURLOPT_ENCODING, "deflate");
  curl_easy_setopt(hnd, CURLOPT_SSL_VERIFYHOST, 1); // TODO
}

static void _filename_gen(struct _site_files *first_sf, char *filename)
{
  int i;
  bool name_double;
  struct _site_files *sf;

  do
  {
    name_double = false;
    srand(1337^filename[0]);

    for (i = FILELENGTH/2; i < FILELENGTH; i++)       filename[i] = ALPHABET[rand()% (strlen(ALPHABET)-1)];     filename[FILELENGTH-1] = '\0';     sf = first_sf;     while (sf != NULL && sf->filename != NULL)
    {
      if (!strcasecmp(sf->filename, filename))
        name_double = true;

      sf = sf->next;
    }
  } while (name_double);
}

static char* _shrink_url(char *rurl) // remove apostrophes etc.
{
  int length;

  while (rurl[0] != '\0' && rurl[0] == ' ')
    rurl++;
  length = strlen(rurl);
  for (int i = 0; i < length/2; i++)     if (rurl[i] == '\'' || rurl[i] == '\"')     {       if (rurl[i] == rurl[length-i-1])       {         rurl[length-i-1] = '\0';         rurl++;       }     }     else       break;   return rurl; } static void _crap_sites_aburl(char **abs_url, CURL *hnd, char *rurl, char *static_burl) {   int abs_urllen;   if (!strncasecmp(rurl, "//", 2)) // gmx-hack   {     abs_urllen = 5+strlen(rurl)+1;     *abs_url = malloc(abs_urllen*sizeof(char));     snprintf(*abs_url, abs_urllen, "http:%s", rurl);   }    } static void _relative_aburl(char **abs_url, CURL *hnd, char *rurl, char *static_burl, short nth_url) {   int abs_urllen;   int domain_end = 0, i;   int base_len = 0;   char *burl = NULL;   if (strncasecmp(rurl, "http://", 7) &&      strncasecmp(rurl, "https://", 8) &&      strncasecmp(rurl, "ftp://", 6) &&      strncasecmp(rurl, "file://", 7) &&      strncasecmp(rurl, "about:", 6) &&     strncasecmp(rurl, "javascript:", 11))   {     if (static_burl != NULL)       burl = static_burl;     else       if (curl_easy_getinfo(hnd, CURLINFO_EFFECTIVE_URL, &burl) != CURLE_OK)       {         fprintf(stderr, "CURLINFO_EFFECTIVE_URL failed\n");         exit(1);       }             if (!strncasecmp(burl, "http://", 7))       domain_end = 7;     else if (!strncasecmp(burl, "https://", 8))       domain_end = 8;     else if (!strncasecmp(burl, "ftp://", 6))       domain_end = 6;     else if (!strncasecmp(burl, "file://", 6))       domain_end = strlen(burl);     if (nth_url > 0)
      for (i = domain_end+1; i < strlen(burl); i++)
      {
        if (burl[i] == '/')
        {
          if (i < strlen(burl)-1)             if (burl[i+1] == '/')               continue;           if (nth_url == 1)           {             base_len = i;             break;           }           else             nth_url--;         }       }     if (nth_url == -1)       for (i = strlen(burl); i > domain_end; i--)
        if (burl[i] == '/')
        {
          base_len = i;
          break;
        }

    if (base_len == 0)
      base_len = strlen(burl);

    abs_urllen = strlen(rurl) + strlen(burl) + 2;
    *abs_url = malloc(sizeof(char)*abs_urllen);
    snprintf(*abs_url, abs_urllen, "%.*s/%s", base_len, burl, rurl);
  }

}

static char* _absolute_url(CURL *hnd, char *rurl, char *static_burl, short nth_url)
{
  char *abs_url = NULL;

  if (nth_url == 1)
  {
    _crap_sites_aburl(&abs_url, hnd, rurl, static_burl);
    if (abs_url != NULL)
      return abs_url;
  }

  _relative_aburl(&abs_url, hnd, rurl, static_burl, nth_url);
  if (abs_url != NULL)
    return abs_url;

  if (nth_url == 1)
  {
    int abs_urllen = strlen(rurl)+1;
    abs_url = malloc(abs_urllen+1);
    strncpy(abs_url, rurl, abs_urllen);
    //abs_url = strdup(rurl);
  }

  return abs_url;

}

static char *_site_files_add(struct _site_userdata *su, char *url, char *base_url, enum FILETYPE ft)
{
  struct _site_files *sf = su->sf;
  char *newurl, *newfilename, *sec_url;
  int i;
  int url_length;
  int filename_length;

  if (url == NULL)
    return NULL;

  url_length = strlen(url)+1;

  //printf("%s  %s\n", su->_base_url, base_url);
  url = _shrink_url(url);
  newurl = _absolute_url(su->_hnd, url, (base_url != NULL) ? base_url : su->_base_url, 1);
  sec_url = _absolute_url(su->_hnd, url, (base_url != NULL) ? base_url : su->_base_url, -1);

  if (sf != NULL)
  {
    if (!strcmp(sf->url, newurl))
    {
      free(newurl);
      free(sec_url);
      return sf->filename;
    }
    while (sf->next != NULL)
    {
      sf = sf->next;
      if (!strcmp(sf->url, newurl))
      {
        free(newurl);
        free(sec_url);
        return sf->filename;
      }
    }
    sf->next = malloc(sizeof(struct _site_files));
    sf = sf->next;
    sf->ri = NULL;
    sf->next = NULL;
  }
  else
  {
    sf = malloc(sizeof(struct _site_files));
    sf->ri = NULL;
    su->sf = sf;
    sf->next = NULL;
  }

  sf->filename = NULL;
  sf->ft = ft;
  sf->url = newurl;
  sf->url2 = sec_url;

  filename_length = strlen(newurl)+1;
  if (filename_length > FILELENGTH)
    filename_length = FILELENGTH;

  newfilename = malloc(sizeof(char)*(filename_length));
  strncpy(newfilename, sf->url, filename_length);
  if (filename_length == FILELENGTH)
    _filename_gen(sf, newfilename);

  sf->filename = newfilename;

  for (i = 0; i < strlen(sf->filename); i++)
  {
    if (sf->filename[i] == '/')
      sf->filename[i] = '_';
    else if (sf->filename[i] == '?')
      sf->filename[i] = '_';
    else if (sf->filename[i] == '#')
      sf->filename[i] = '_';
    else if (sf->filename[i] == '@')
      sf->filename[i] = '_';
    else if (sf->filename[i] == '%')
      sf->filename[i] = '_';
    else if (sf->filename[i] == ':')
      sf->filename[i] = '_';
    else if (sf->filename[i] == ' ')
      sf->filename[i] = '_';
  }

  return sf->filename;
}

void _save_file_css_save(void *userdata, char *gap, int length, bool gapped)
{
  struct _css_filter_save_userdata *cfsu = (struct _css_filter_save_userdata*) userdata;
  char *filename;

  if (cfsu->fp == NULL)
  { // first call of this func.
    cfsu->fp = fopen(cfsu->filename, "w");
    cfsu->url = NULL;
  }

  if (gapped)
    cfsu->url = __join_together(cfsu->url, gap, length);
  else if (!gapped && cfsu->url != NULL)
  {
    filename = _site_files_add(cfsu->su, cfsu->url, cfsu->_css_base_url, CSS_IMG);
    fprintf(cfsu->fp, "%s", filename);
    free(cfsu->url);
    cfsu->url = NULL;
    fprintf(cfsu->fp, "%.*s", length, gap);
  }
  else
    fprintf(cfsu->fp, "%.*s", length, gap);

}

size_t _save_file_css(char *txt, size_t size, size_t nmemb, struct _site_files *sf) // feed the replacer!
{
  if (size == 0 && nmemb == 0 && sf->fp != NULL)
  {
    fclose(sf->fp);
  }
  else if (sf->fp == NULL)
      sf->fp=fopen(sf->filename, "w");

  if (sf->fp == NULL)
  {
    perror("fopen");
    return 0;
  }

  replace(sf->ri, txt, size*nmemb);

  return size*nmemb;
}

size_t _save_file(char *txt, size_t size, size_t nmemb, struct _site_files *sf)
{
  int i;

  if (size == 0 && nmemb == 0 && sf->fp != NULL)
  {
    fclose(sf->fp);
  }
  else if (sf->fp == NULL)
      sf->fp=fopen(sf->filename, "w");

  if (sf->fp == NULL)
  {
    perror("fopen");
    return 0;
  }

  for (i = 0; i < size*nmemb; i++)     fputc(txt[i], sf->fp);

  return size*nmemb;
}

static void _set_css_ri(struct _replace_info *ri, void *userdata, void *userfunction)
{
  ri->begin = "url(";
  ri->end = ')';

  ri->userfunction = userfunction;
  ri->userdata = userdata;

  ri->buffer = NULL;

  ri->begin_progress = 0;
  ri->begin_length = 4;
  ri->inside_gap = false;

  ri->status = -1;
}

static int _add_download_files(struct _site_files *sf, struct _site_userdata *su, CURL *mhnd, short nth_url)
{
  struct _css_filter_save_userdata *cfsu;
  CURL *hnd;

  sf->fp = NULL;
  sf->nth_url = nth_url;

#ifdef DEBUG
  static int id;
  sf->id = id++;
  fprintf(stderr, "Download (id: %d, %s) %s -> %s\n", id, _filetype_string(sf->ft), sf->url, sf->filename);
#endif

  if (sf->ft == STYLE)
  {
    sf->ri = malloc(sizeof(struct _replace_info));
    cfsu = malloc(sizeof(struct _css_filter_save_userdata));
    cfsu->_css_base_url = sf->url;
    cfsu->fp = NULL;
    cfsu->filename = sf->filename;
    cfsu->su = su;
    _set_css_ri(sf->ri, cfsu, _save_file_css_save);
    if (nth_url == 1)
    {
      hnd = curl_easy_init();
      _set_chnd(hnd, sf->url, _save_file_css, sf);
    }
    else if (nth_url == 2)
    {
      if (sf->url2 != NULL)
      {
        hnd = curl_easy_init();
        _set_chnd(hnd, sf->url2, _save_file_css, sf);
      }
      else
        return -1;
    }
    else
      return -1;
  }
  else
  {
    if (nth_url == 1)
    {
      hnd = curl_easy_init();
      _set_chnd(hnd, sf->url, _save_file, sf);
    }
    else if (nth_url == 2)
    {
      if (sf->url2 != NULL)
      {
        hnd = curl_easy_init();
        _set_chnd(hnd, sf->url2, _save_file, sf);
      }
      else
        return -1;
    }
    else
      return -1;
  }

  curl_easy_setopt(hnd, CURLOPT_PRIVATE, sf);
  curl_multi_add_handle(mhnd, hnd);

#ifdef DEBUG
  sf->done = false;
#endif

  return 1;
}

static void _download_files(struct _site_userdata *su)
{
  int handles = 1, msgs_in_queue, maxfd;
  int iteration = 0;
  int downloads = 0; // current downloads
  char *curlinfo_private;
  CURL *mhnd;
  CURLMsg *cmsg;
  struct _site_files *first_sf = su->sf;
  struct _site_files *sf = first_sf;
  struct _site_files *tmp_sf;
  struct timeval timeout;
  fd_set fdread, fdwrite, fderr;
  char *burl;
#ifdef DEBUG
  char *ip;
#endif
  long response_code;

  if (sf == NULL)
    return;
  mhnd = curl_multi_init();

  if (_add_download_files(sf, su, mhnd, 1) > 0)
    downloads++;
  sf = sf->next;

  while (CURLM_CALL_MULTI_PERFORM == curl_multi_perform(mhnd, &handles) && handles != 0);

  do
  {
    iteration++;
    FD_ZERO(&fdread);
    FD_ZERO(&fdwrite);
    FD_ZERO(&fderr);
    timeout.tv_sec = SELECT_TIMEOUT_SEC;
    timeout.tv_usec = 0;
    curl_multi_fdset(mhnd, &fdread, &fdwrite, &fderr, &maxfd);
    switch(select(maxfd+1, &fdread, &fdwrite, &fderr, &timeout))
    {
      case -1:
#ifdef DEBUG
        fprintf(stderr, "select bad :(\n");
        perror("!!! select failed ");
        while ((cmsg = curl_multi_info_read(mhnd, &msgs_in_queue)) != NULL)
        {
          if (cmsg->data.result != 0)
          {
            curl_easy_getinfo(cmsg->easy_handle, CURLINFO_PRIMARY_IP, &ip);
            fprintf(stderr, "ip: %s url: %s result: %d", ip, burl, cmsg->data.result);
            if (cmsg->data.result == 7)
              fprintf(stderr, " (couldn't connect)");
            fprintf(stderr, "\n");
          }
        }
        fprintf(stderr, "-----------\n");
#endif
      default:
        while ((cmsg = curl_multi_info_read(mhnd, &msgs_in_queue)) != NULL)
          if (cmsg->msg == CURLMSG_DONE)
          {
            curl_easy_getinfo(cmsg->easy_handle, CURLINFO_PRIVATE, &curlinfo_private);
            tmp_sf = (struct _site_files*)curlinfo_private;
            if (tmp_sf->ft == CSS_IMG)
              _save_file_css(NULL, 0, 0, tmp_sf);
            else
              _save_file(NULL, 0, 0, tmp_sf);

            downloads--;
#ifdef DEBUG
            tmp_sf->done = true;
#endif

            curl_easy_getinfo(cmsg->easy_handle, CURLINFO_EFFECTIVE_URL, &burl);
            curl_easy_getinfo(cmsg->easy_handle, CURLINFO_RESPONSE_CODE, &response_code);
            if (response_code >= 400)
            {
              if (tmp_sf->nth_url == 2)
              {
                fprintf(stderr, "Failed (%ld): %s -> %s (%s)   ", response_code, burl, tmp_sf->filename, _filetype_string(tmp_sf->ft));
                fprintf(stderr, "second url: %s\n", tmp_sf->url2);
              }
              else
              {
                if (_add_download_files(tmp_sf, su, mhnd, 2) > 0)
                  downloads++;
              }
            }

            curl_easy_cleanup(cmsg->easy_handle);
          }
        do
        {
          // download 1st file
          if (iteration == 1 && sf != NULL && downloads < MAX_P_FILE_DOWNLOADS)           {             if (_add_download_files(sf, su, mhnd, 1) > 0)
              downloads++;
          }

          while (sf != NULL && sf->next != NULL && downloads < MAX_P_FILE_DOWNLOADS) // sf->next must not be NULL as we are adding to the list ;)
          {
            if (_add_download_files(sf->next, su, mhnd, 1) > 0)
              downloads++;
            sf = sf->next;
          }
        } while (CURLM_CALL_MULTI_PERFORM == curl_multi_perform(mhnd, &handles) && handles != 0);

        break;

    }
  } while(handles != 0);

  while ((cmsg = curl_multi_info_read(mhnd, &msgs_in_queue)) != NULL)
  {
    curl_easy_getinfo(cmsg->easy_handle, CURLINFO_PRIVATE, &curlinfo_private);
    tmp_sf = (struct _site_files*)curlinfo_private;
    if (tmp_sf->ft == CSS_IMG)
      _save_file_css(NULL, 0, 0, tmp_sf);
    else
      _save_file(NULL, 0, 0, tmp_sf);
    downloads--;

    curl_easy_getinfo(cmsg->easy_handle, CURLINFO_EFFECTIVE_URL, &burl);
    curl_easy_getinfo(cmsg->easy_handle, CURLINFO_RESPONSE_CODE, &response_code);
    if (response_code >= 400)
    {
      if (tmp_sf->nth_url == 2)
      {
        fprintf(stderr, "Failed (%ld): %s -> %s (%s)   ", response_code, burl, tmp_sf->filename, _filetype_string(tmp_sf->ft));
        fprintf(stderr, "second url: %s\n", tmp_sf->url2);
      }
      else
      {
        if (_add_download_files(tmp_sf, su, mhnd, 2) > 0)
          downloads++;
      }
    }

    curl_easy_cleanup(cmsg->easy_handle);
#ifdef DEBUG
    tmp_sf->done = true;
#endif
  }

  sf = first_sf;
  while (sf != NULL)
  {
#ifdef DEBUG
    printf("id: %d url: %s url2: %s done: %d\n", sf->id, sf->url, sf->url2, sf->done);
#endif
    free(sf->url);
    free(sf->url2);
    free(sf->filename);
    if (sf->ri != NULL)
    {
      free(sf->ri->userdata);
      free(sf->ri);
    }
    tmp_sf = sf;
    sf = sf->next;
    free(tmp_sf);
  }

#ifdef DEBUG
  if (downloads != 0)
  {
    printf("!!downloads: %d (%s)\n", downloads, su->_base_url);
    exit(-1);
  }
#endif
  curl_multi_cleanup(mhnd);

}

void _css_filter(void *userdata, char *gap, int length, bool gapped)
{
  struct _css_filter_userdata *cfu = (struct _css_filter_userdata*) userdata;
  char *filename;

  if (gapped)
    cfu->url = __join_together(cfu->url, gap, length);
  else if (!gapped && cfu->url != NULL)
  {
    filename = _site_files_add(cfu->su, cfu->url, NULL, CSS_IMG);
    _user_function(cfu->su, "%s", filename);
    free(cfu->url);
    cfu->url = NULL;
    _user_function(cfu->su, "%.*s", length, gap);
  }
  else
    _user_function(cfu->su, "%.*s", length, gap);
}

static void _getpage_startElementSAX (void * userData, const xmlChar * name, const xmlChar ** atts)
{
  int i, j;
  char *n = (char*)name;
  char *filename, *url;
  struct _site_userdata *su = userData;
  struct _css_filter_userdata cfu;
  struct _replace_info ri;

  _user_function(su, "<%s", n);   if (atts != NULL)     for (i = 0; atts[i] != NULL; i+=2)     {       filename = NULL;       if (!strncasecmp(n, "img", 4) && !strncasecmp((char*)atts[i], "src", 4))       {         filename = _site_files_add(su, (char*)atts[i+1], NULL, IMG);         _user_function(su, " src=\"file:%s\"", filename);       }       else if (!strncasecmp(n, "input", 6) && !strncasecmp((char*)atts[i], "src", 4))       {         filename = _site_files_add(su, (char*)atts[i+1], NULL, IMG);         _user_function(su, " src=\"file:%s\"", filename);       }       else if (!strncasecmp(n, "script", 7) && !strncasecmp((char*)atts[i], "src", 4))       {         filename = _site_files_add(su, (char*)atts[i+1], NULL, SCRIPT);         _user_function(su, " src=\"file:%s\"", filename);       }       else if (!strncasecmp(n, "iframe", 7) && !strncasecmp((char*)atts[i], "src", 4))       {         filename = _site_files_add(su, (char*)atts[i+1], NULL, IFRAME);         _user_function(su, " src=\"file:%s\"", filename);       }       else if (!strncasecmp((char*)atts[i], "style", 6))       {         cfu.su = su;         cfu.url = NULL;         _set_css_ri(&ri, &cfu, _css_filter);         _user_function(su, " style=\"");         replace(&ri, (char*)atts[i+1], strlen((char*)atts[i+1]));         if (cfu.url != NULL)           free(cfu.url);         _user_function(su, "\"");         filename = (void*)-1;       }       else if (!strncasecmp(n, "link", 5) && !strncasecmp((char*)atts[i], "href", 5))       {         for (j = 0; atts[j] != NULL; j+=2)           if (!strncasecmp((char*)atts[j], "rel", 4))            {             if (!strncasecmp((char*)atts[j+1], "stylesheet", 11))             {               filename = _site_files_add(su, (char*)atts[i+1], NULL, STYLE);               _user_function(su, " href=\"file:%s\"", filename);             }             else if (!strncasecmp((char*)atts[j+1], "icon", 5))             {               filename = _site_files_add(su, (char*)atts[i+1], NULL, IMG);               _user_function(su, " href=\"file:%s\"", filename);             }             else if (!strncasecmp((char*)atts[j+1], "shortcut icon", 14))             {               filename = _site_files_add(su, (char*)atts[i+1], NULL, IMG);               _user_function(su, " href=\"file:%s\"", filename);             }           }       }       else if (!strncasecmp(n, "a", 2) && !strncasecmp((char*)atts[i], "href", 5))       {         url = _absolute_url(su->_hnd, (char*)atts[i+1], su->_base_url, 1);
        _user_function(su, " href=\"%s\"", url);
        free(url);
        filename = (void*)-1;
      }
      else if (!strncasecmp(n, "base", 5) && !strncasecmp((char*)atts[i], "href", 5))
      {
        _user_function(su, " href=\".\"");
        filename = (void*)-1;
      }
      else if (!strncasecmp(n, "form", 5) && !strncasecmp((char*)atts[i], "action", 7))
      {
        url = _absolute_url(su->_hnd, (char*)atts[i+1], su->_base_url, 1);
        _user_function(su, " action=\"%s\"", url);
        free(url);
        filename = (void*)-1;
      }
      else if (!strncasecmp(n, "meta", 5) && !strncasecmp((char*)atts[i], "http-equiv", 8) && !strncasecmp((char*)atts[i+1], "Content-Type", 13))
      {
        su->_utf8_meta_set = true;
        _user_function(su, " http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"");
        //filename = (void*)-1;
        break;
      }

      if (filename == NULL)
        _user_function(su, " %s=\"%s\"", (char*)atts[i], (char*)atts[i+1]);

    }

  _user_function(su, ">");

}

static void _getpage_endElementSAX (void * userData, const xmlChar * name)
{
  char *n = (char*)name;
  struct _site_userdata *su = userData;

  if (!strncasecmp("head", n, 5) && !su->_utf8_meta_set)
    _user_function(su, "<meta http-equiv="\&quot;Content-Type\&quot;" content="\&quot;text/html;" charset="utf-8\&quot;/" /> ");
  else if (strncasecmp("br", n, 3) && strncasecmp("img", n, 4) && strncasecmp("meta", n, 5) && strncasecmp("link", n, 5) && strncasecmp("input", n, 5))
    _user_function(su, "\n", n);
}

static void _getpage_charDataSAX (void * userData, const xmlChar * buffer, int length)
{
  struct _site_userdata *su = userData;
  _user_function(su, "%.*s", length, buffer);
}

static size_t _chunk_parse(void *ptr, size_t size, size_t nmemb, xmlParserCtxtPtr ctxt)
{
  char *txt = ptr;
#ifdef DEBUG
  FILE *fp = fopen("bare.txt", "a+");

  fprintf(fp, "%.*s", (int)(size*nmemb), txt);
  fclose(fp);
#endif
  htmlParseChunk(ctxt, txt, size*nmemb, 0);

  return nmemb*size;
}

void getpage(char *url, void *site_function, void *userdata)
{
  struct _site_userdata su;
  su.site_function = site_function;
  su.userdata = userdata;
  su.sf = NULL;
  su._utf8_meta_set = false;
  su._base_url = NULL;
  CURLcode ret;

  htmlSAXHandler hsh;
  htmlParserCtxtPtr ctxt;

#ifdef DEBUG
  remove("bare.txt");
#endif

  memset(&hsh, 0, sizeof(htmlSAXHandler));

  hsh.startElement = _getpage_startElementSAX;
  hsh.endElement = _getpage_endElementSAX;
  hsh.characters = _getpage_charDataSAX;

  ctxt = htmlCreatePushParserCtxt(&hsh, &su, NULL, 0, NULL, XML_CHAR_ENCODING_UTF8);
  htmlCtxtUseOptions(ctxt, HTML_PARSE_RECOVER);

  curl_global_init(CURL_GLOBAL_ALL);
  su._hnd = curl_easy_init();
  _set_chnd(su._hnd, url, _chunk_parse, ctxt);
  ret = curl_easy_perform(su._hnd);

  htmlParseChunk(ctxt, NULL, 0, 1);
  htmlFreeParserCtxt(ctxt);

  curl_easy_getinfo(su._hnd, CURLINFO_EFFECTIVE_URL, &su._base_url);

#ifdef DEBUG
  double val;
  if (curl_easy_getinfo(su._hnd, CURLINFO_SPEED_DOWNLOAD, &val) == CURLE_OK)
    printf("Average download speed: %0.3f kbyte/sec.\n", val / 1024);
#endif

  fprintf(stderr, "Downloading files ...\n");
  _download_files(&su);
  curl_easy_cleanup(su._hnd);
  // curl_global_cleanup();
}

(save it as getpage.c)

To use that library:

#include
#include

#include "getpage.h"

void site_function(void *userdata, const char* format, va_list ap)
{
  FILE *fp = userdata;

  vfprintf(fp, format, ap);
  fflush(fp);
}

int main(int argc, char **argv)
{

  FILE *fp = fopen(argv[2], "w");
  if (fp == NULL)
    return -1;

  getpage(argv[1], site_function, fp);

  fclose(fp);
}

(save that as getpagetest.c)

and now the Makefile (tabulators!!):

CC=/usr/bin/colorgcc
CFLAGS=-O2 -ggdb -Wall

getpagetest: getpage.o getpagetest.c
        $(CC) $(CFLAGS) -std=c99 -lxml2 -lcurl -o getpagetest getpagetest.c getpage.o -I /usr/include/libxml2/

getpage.o: getpage.c
        $(CC) $(CFLAGS) -o getpage.o -Wall -std=c99 -fPIC getpage.c -I /usr/include/libxml2/ -c

Usage:

 make && ./getpagetest heise.de index.html && chromium --proxy-server=localhost:1 index.html