From kragen@dnaco.net Fri Jul 24 14:22:00 1998 -0400
Date: Fri, 24 Jul 1998 14:21:59 -0400 (EDT)
From: Kragen <kragen@dnaco.net>
To: jsdy@tux.org
cc: Kragen <kragen@pobox.com>, alan@lxorguk.ukuu.org.uk, jneves@rnl.ist.utl.pt, 
    astor@guardian.no, mitch@execpc.com, fjcfma@rnl.ist.utl.pt
Subject: Re: chroot issue (was: Dynamic linker issues... )
In-Reply-To: <199807241743.NAA09258@gwyn.tux.org>
Message-ID: <Pine.GSU.4.02.9807241354240.13267-100000@picard.dnaco.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Keywords:
X-UID: 739
Status: O
X-Status: 

On Fri, 24 Jul 1998, Joseph S D Yao wrote:
> Kragen spoke:
> > It would be worthwhile to implement a "jail" mechanism that could
> > restrain any malicious process.  chroot() is a start; the filesystem is
> > the biggest communication channel between processes.
> 
> You want to implement a true "virtual machine".  Right?

Well, I don't really care to implement interrupts, page tables, and all
that stuff.  I just want to securely partition things.

I think it might be possible to restrain a process by ptrace()ing it
with PTRACE_SYSCALL, and then sending it signals if it tries to do
something illegal.  It might even be reasonable to signal it to
interrupt the syscall, then modify the memory cell containing the
return value from the syscall to make it appear as if the syscall
failed.  I'm not sure how severe a performance impact this would have.

I went through the list of syscalls in entry.S.  Something like a
fourth of them (40 or so on 2.0.30) look like they should be OK to
leave unrestricted.

Most programs use only a relatively small number of syscalls.  strace
ls, for example, reports _exit, brk, close, execve, fcntl, fstat,
getdents, getpid, ioctl, lseek, mmap, mprotect, munmap, open,
personality, time, and write.  Of these, only execve and open would
need any checking, and if chroot were to be used to do part of the
jailing, even they wouldn't need checking.  (Unless personality() can
do things I'm not aware of.  Restricting it to PER_LINUX might make
sense most of the time.)  (You might conceivably want to restrict
time().)

Running gs to do something simple uses the following syscalls:
_exit access bind brk close connect execve fcntl fstat getpeername
getpid getrlimit getsockname gettimeofday ioctl mmap mprotect munmap
open personality read recvfrom select send sendto setsockopt shutdown
socket stat uname write writev

Of these, restricting access(), open(), execve(), and stat() would
effectively restrict the filesystem, restricting bind(), connect(),
socket(), sendto(), and recvfrom()  would effectively restrict the
network, and restricting uname() and gettimeofday() would effectively
prevent any other communication with the outside world.  _exit, brk,
close, fcntl, fstat, getpeername, getpid, getrlimit, getsockname,
ioctl, mmap, mprotect, munmap, read, select, setsockopt, shutdown,
write, and writev could safely remain unrestricted.

I guess what I'm trying to say is that this looks feasible -- forbid
most syscalls, unconditionally allow 20-40, and put access checks on
the rest, and you have an effective jail.

Kragen


