Hur man skapar nya primitiver till smalltalk

Texten kommer ursprungligen från http://www.create.ucsb.edu/squeak/DIYSqPrims.html

The Do-It-Yourself Guide to Squeak Primitives

by Stephen Travis Pope (stp@create.ucsb.edu) 4/14/98

Introduction
1: Be sure you Really Need a Primitive
2: Design the Smalltalk Interface
3: Design the C Interface
4: Write the Smalltalk Prim-Calling Method
5: Write the Smalltalk Primitive Method
6: Write the Smalltalk "Glue" Method
7: Add an Entry to the Interpreter's Primitive Table
8: Regenerate the Interpreter
9: Write the C Code
10: Add Function Prototype(s) to the Squeak Header file
11: Add Your C File to the VM Make/Project File
12: Recompile the VM
13: Test It
Notes
Epilog: Wouldn't it be nice if....

Introduction

This outline describes how to extend Squeak with your own hand-written primitives. It's a bit terse (but you'd better be a pretty advanced Smalltalk and C programmer before attempting this anyway :-) ). The document walks you through the 13 easy steps (well, at least 8 of them are easy) of creating the Smalltalk and C sides of the primitive interface, and making a new virtual machine with your extended primitives.

There are many aspects of building general-purpose interfaces between Squeak and external facilities that are outside of the scope of a basic 9-page introduction, and are left out here.

Comments are invited; please send them to stp@create.ucsb.edu.

1: Be sure you Really Need a Primitive

Generally, there are several reasons to hand-code primitives. In the example I give below, I need to access an OS-level driver for MIDI input. There's just no way I can write this in Smalltalk. For examples of performance-optimization primitives (i.e., where the prim's body is written in Smalltalk and translated to C for performance reasons only), look at John Maloney's sound synthesis classes (AbstractSound and its subclasses) for examples of how to write low-level Smalltalk code for translation into C. (I don't really go into this here.)

Please note that if lots of us start writing random and not-really-well-motivated primitives we won't be able to share any code at all any more. The namespace of primitives is limited; there is no formal mechanism for managing that space with multiple primitive-writers; and merging two virtual machines with different primitive extensions can be a *real* pain. Do not do this lightly.

Go back to the top

2: Design the Smalltalk Interface

The first step in the coding is to determine what the Smalltalk side of the primtive should look like. This means designing the signature (i.e., the receiver, arguments, and return value) of the high-level method that expands into the primitive call.

For the purposes of this example, I'll take a method from the Siren/Squeak MIDI I/O interface. This is the input primitive that reads a MIDI data packet from the OS-level driver. The details are moot for this presentation.

I have a class MIDIPacket that has inst. vars. as shown in the following definition.

	Object subclass: #MIDIPacket
		instanceVariableNames: 'length time flags data '
		...

The first three inst. vars. are integers, the last is a ByteArray (which is pre-allocated to 3 bytes--the max size of normal MIDI messages [system exclusive packets are handled specially]).

The primitive will live in class PrimMIDIPort and will take a MIDIPacket and pass it down to the VM, who will fill it in with data read from the MIDI driver. The primitive returns the number of bytes read (the length inst. var. of the packet). Since the primitive does not use the state of its receiver, it could be put in just about any class. The argument is the important component.

So, the primitive method will look like,

    PrimMIDIPort >> primReadPacket: packet data: data

I pass the packet object and the data byte array separately for simplicity of the C code and for flexibility (in case I decide to split them into two Smalltalk objects in the future). It's also easier to decompose an object in Smalltalk than it is in C.

Go back to the top

3: Design the C Interface

The next step is to design the interface for the C-side of the primitive, and to write a C function prototype for it. For the Siren VM C header file, I did:

	int sqReadMIDIPacket(int MIDIpacket, int dataBuffer);
		// Read input into a MIDI packet. (prim. 614)
		// 'MIDIpacket' is interpreted as (MIDIPacket *) and is
		// written into with the time-stamp, flags, and length.
		// 'dataBuffer' is interpreted as (unsigned char *) and 
		// gets the MIDI message data.
		// Answer the number of data bytes in the packet.

Note that all arguments are passed as ints; you can cast them into 'whatever' at will in the C code (see below). (This is yet another reason to write really good comments in your interfaces.)

Most of my primitives return integers (negative values for common error conditions, I apologize, I just programmed in C for too long in my youth) and fail only in extreme situations. This is a personal preference--I tend to pass the error return values up to higher levels of code to handle. Other designers might always want to have a failure handler right in the method that called the prim--see the discussion below. It would be easier if Squeak had well-integrated exception handling and raised a system exception on primitive failure so that the calling method could decide whether to use the primitive method's failure code or not.

Go back to the top

4: Write the Smalltalk Prim-Calling Method

This is the high-level method that will call the direct primitive method. It is generally part of one of your "application" classes. In class PrimMIDIPort, instance side, "primitives" protocol, I have the following,

	get: packet
		"Read the data from the receiver into the argument (a MIDIPacket)."

		| len | "reads packet header and data, answers amt. of data read"
		len := self primReadPacket: packet data: packet data.
		len >= 0
			ifFalse: [...What to do on bad return value rather than failure...].
		^len

In Siren, this is called by a read loop that's triggered by a semaphore coming up from the VM, but that's outside of the scope here.

The actual primitive methods generally have names that start with "prim" as shown above.

Go back to the top

5: Write the Smalltalk Primitive Method

This is the actual link to the VM primitive. You need to pick an unassigned primitive number (in the Interpreter class's PrimitiveTable); I found that 614 was free (I had already used 610-613 for useless drivel). In class PrimMIDIPort, I added the following method,

	primReadPacket: packet data: data
		"Read a packet from the MIDI driver."
		"Write data into the arguments; answer the number of bytes read."

		<primitive: 614>
		self error: 'MIDI read failed.'

The "<primitive: XXX>" construct is a primitive call--it's Smalltalk's way of "trapping" into the VM. The body of the method is the primitive. The primitive number (614) is an index into the table of all primitives that's in the Interpreter class.

If the primitive returns successfully, the statements that follow the primitive call will not be executed. On the other hand, if the primitive fails, the Smalltalk code that follows the primitive call will be executed. This is quite hand for cases where you want to try a Smalltalk implementation (i.e., a good number of primitives fail if the arguments are not of the default types), or re-try the primitive with different arguments (i.e., coerce one of the arguments and re-send the method).

The return value from the primitive (actually, the thing left on the top of the stack by the glue code--see below) will be the return value of this method.

Go back to the top

6: Write the Smalltalk "Glue" Method

OK, this is where it gets a bit more complicated. The Squeak Interpreter class is written in Smalltalk, but all of its instance methods get translated to C and form the core of the Squeak virtual machine (that huge file named "interp.c" on most platforms). There are class methods in this class that create the primitive index table (where element 614 will point to our code), and the instance methods whose names correspond to the names given in the primitive table are the actual bodies of the primitives. These "glue" methods typically unpack the arguments from the stack, call the actual C code of the primitive, and handle the return values.

There is a great deal of flexibility here, and interested readers are encouraged to read and analyze more of the Interpreter primitive methods (for example the sound I/O or network interface methods). Remember that these are all translated to C, so they cannot use all the language features of Smalltalk. (I'd give you the deatils of the Smalltalk-to-C translator if I understood them.)

The example that follows demonstrates the basic flow of the several stages in a typical glue method:

1) unpack the argument(s) from the stack;
2) test the arguments for validity (optional);
3) call the C function that implements the primitive (optional);
4) pop the arguments (and possible the receiver) off of the stack; and
5) push the return value onto the stack.

I have annotated the method below with these stages (in parentheses). Also note that I generally include both the Smalltalk method header and C function prototype as comments in this method; this makes debugging it much easier. In the Interpreter (and/or DynamicInterpreter) class, we have to write,

	primitiveReadMIDIPacket
		"Read a message (a MIDIPacket) from the MIDI interface."
		"ST: PrimMIDIPort primReadPacket: packet data: data"
		"C: int sqReadMIDIPacket (int packet, int data);"

		| packet data answer |
	"Get the arguments"
(1)		 data := self stackValue: 0.
(1)		 packet := self stackValue: 1.
	"Make sure that 'data' is byte-like"
(2)		 self success: (self isBytes: data).
	"Call the primitive"
		successFlag
(3)		 ifTrue: [answer := self cCode: 'sqReadMIDIPacket (packet, data + 4)'].
	"Pop the args and rcvr object"
		successFlag
(4)			 ifTrue: [self pop: 3.
	"Answer the number of data bytes read"
(5)				 self push: (self integerObjectOf: answer)]

For (1), note that the arguments are pushed onto the stack in reverse order (so the last argument is stack(0), the next-to-last is stack(1), etc.). There are methods (in ObjectMemory, the superclass if Interpreter) that allow you to get integers and other kind of things from the stack with automatic conversion. (Look at the other primitive methods in class Interpreter for lots of examples.) Since both of the arguments here are pointers, I use stackValue:.

Step (2) is a simple example of type-checking on primitive arguments. The success: message sets the primitive success/fail flag based on whether the second argument is a ByteArray. The method/function success: is used in the Smalltalk glue code and in C primitive implementations to signal primitive success or failure; to fail, set success to false, as in the test in step 2.

Step (3) uses the message "cCode: aString"; it takes a C function prototype as its argument and it is here that we actually call our C-language primitive. Note that I must use the actual variable names packet and data in the string. The "data + 4" means that the argument is a ByteArray but that the C code casts it as (unsigned char *); 4 is the size of the object header, so I skip it to pass the base address of the byte array's actual (char *) data. This is a hard-coded special value that implies that I know the object header is 32 bits.

In step (4), we pop the two arguments *and* the receiver object (a PrimMIDIPort instance) off of the stack if the primitive succeeded.

Step (5) pushes the C function's return value onto the stack as an integer. There are other coercion functions in ObjectMemory that can be found used in other primitive methods in class Interpreter.

I have not discussed data sharing between glue code and primitives, but there are some nifty and flexible facilities for it. (You can actually declare a temporary variable or argument to the glue method with the exact format it will have in C.) Look at John Maloney's sound primitives, or browse senders of var:declareC: as used in Interpreter >> primitiveSoundGetRecordingSampleRate (or pay me a really fat consulting fee to tell you about it :-) .

Because the VM is single-threaded, the garbage collector will not run while your glue code (and the primitive it calls) is active, so the objects you pass to C are safe for the duration of the primitive. If you want to pass an object pointer down to C code and have it held onto across primitive calls, you have to register it with the garbage collector as special so it will not be moved. (I've generated gigabytes of core dumps over the years with a whole array of VMs by forgetting this.) Look at the senders of SystemDictionary >> registerExternalObject: for places that do this.

The glue code method is translated to C when you generate a new interp.c file (see below) so it is important that you can't just send arbitrary Smalltalk messages from here. Look at the other primitive glue code methods in Interpreter (or DynamicInterpreter) for more examples.

Go back to the top

7: Add an Entry to the Interpreter's Primitive Table

Look at Interpreter class's initializePrimitiveTable method; edit it in-place or add your own init. method. (Be sure to use the same prim. number you used in step 5 above.)

	...
	(614 primitiveReadMIDIPacket)
	...

The init. method is called automagically when you regenerate the interpreter, so you don't have to do that now.

Although there is no formal method for registering primitive numbers, Ward Cunningham's Wiki server does have a page for "voluntary" reservations (see http://c2.com:8080/PrimitiveNumberRegistry). I strongly recommend that you coordinate with other developers by looking here and telling the world what numbers you're using.

Go back to the top

8: Regenerate the Interpreter

This is where you translate the Interpreter class's instance methods to C, typically with,

	Interpreter translate: 'interp.c' doInlining: true.

This'll take a while, and will create a file named "interp.c" in the same directory as the VI. If you haven't already done so, you also need to write out all the other VM sources by executing,

	InterpreterSupportCode writeMacSourceFiles

or whatever is appropriate on your platform.

If you're new to VM-generation, you should definitely make sure you can re-create the default Squeak VM for your platform before you try adding new primitives. Test the development project or makefile and platform-specific source files first, then add your new primitive code.

Go back to the top

9: Write the C Code

Now we have to actually write the C code for the primitive's implementation. In my case, I'm just taking some data out of a data structure I maintain in the VM (it's updated asynchronously from the MIDI driver call-back routines) and copying it into the objects I passed down as arguments to the primitive call. This is a good example for our tutorial because you don't need to understand (or care about) the domain, and it demonstrates Smalltalk/C data-passing. The source (somewhat simplified) for this primitive looks like the following.

	/***************************************************************
	 * sqReadMIDIPacket -- Sent in response to the input semaphore
	 * This is to up-load MIDI messages into the arguments (a MIDIPacket
	 * and its data ByteArray -- both passed as ints and cast here).
	 *	ST: PrimMIDIPort primReadPacket: packet data: data
	 */
	int sqReadMIDIPacket(int ipacket, int idata) {
					// The ipacket object is defined as:
					//	 Object subclass: #MIDIPacket
					//		 instanceVariableNames: 'length time flags data '
					// idata is a byte array (+4 to skip the object header)
		sqMIDIEvent *outPtr;
		unsigned char *cdata;
		int len, i;
		unsigned char *pdata = (unsigned char *)idata;

		success(true);			// Set the success flag to true.
		if (itemsInInQ == 0)		// Return immediately if there is no input.
			return (0);
		if (sqCallback == 0)		// Answer an error code if input is off.
		//	success(true);		// Set the success flag to false to fail.
			return (-1);
						// Get a pointer to the MIDI event structure.
		outPtr = &sqMIDIInQ;[itemsInInQ2++];
						// Print a message for debugging--yes,
						// you can use printf() in the VM!
		if(debug_mode) printf("%x %x %x\n", 
			outPtr->data[0], outPtr->data[1], outPtr->data[2]);

		len = outPtr->len;		// Copy the response fields.
						// Copy the driver data into the packet.
						// (Inst vars are 1-based.)
		instVarAtPutInt(ipacket, 1, len);   // Copy length, time, flags.
		instVarAtPutInt(ipacket, 2, (int)(outPtr->timeStamp));
		instVarAtPutInt(ipacket, 3, (int)(outPtr->flags));

		cdata = &(outPtr->data[0]);	// Copy MIDI message bytes into the packet.
		for (i=0; i<len; i++)
			*pdata++ = *cdata++;
			
		return (len);			// Answer len.
	} // End of fcn

Most of this should be pretty obvious to the seasoned C programmer. The cast of the idata argument from int to (unsigned char *) will work because it's actually a ByteArray (+ 4) in Smalltalk. The instVarAtPutInt() macro is defined as,

	#define longAtput(i, val)   (*((int *) (i)) = val)
	#define instVarAtPutInt(obj, slot, val) \
		longAtput(((char *)obj + (slot << 2)), (((int)val << 1) | 1))

This is nasty, but allows you to stuff 31-bit integers into SmallInteger instance variables with abandon. If you look into interp.c, there are more useful macros for primitive writers that would help you if you need to write floats, etc.

It's outside of the scope of this introduction to go into the details of object unpacking in C, but given the above macros, Dan Ingalls's notes on the Squeak object format (from their OOPSLA paper), and a good debugger, you can pretty much do anything. As I stated above, however, it's my opinion that it's easier to unpack objects in Smalltalk, so I have lots of primitives that take several arguments that are the components of one top-level Squeak object.

The last line of the function returns an integer to the glue code, which pushes it onto the stack explicitly after popping the arguments and receiver object.

Note that I also use printf() for debugging, On a Mac, printf() from the VM pops up an output window for the messages. I use the following macros for debugging primitives,

	Boolean debug_mode = true;	// Enable/Disable printfs (see macros below)
					// Debugging macros
	#define dprint1(str)			if(debug_mode) printf(str)
	#define dprint2(str, val)		if(debug_mode) printf(str, val)
	#define dprint3(str, v1, v2)		if(debug_mode) printf(str, v1, v2)
	etc...

(The same could be done with #ifdef, of course.)

Go back to the top

10: Add Function Prototype(s) to the Squeak Header file

In order for the primitive call (in interp.c) to work, you need to provide a function prototype (at least if you use a C compiler that requires them, which you should). In the main Squeak header file--sq.h--I added,

	/* MIDI Prims */		// Added by STP
	#include "OMS.h"		// OMS definitions and structs
	#include <MIDI.h>		// Apple MIDI Libraries
	#include "sqMIDI.h"		// Squeak MIDI Structs and Prims

and in my package's header file--sqMIDI.h--I have,

	int sqReadMIDIPacket(int MIDIpacket, int dataBuffer);

Note that I have to include another header file for the OMS libraries, and to include the Apple MIDI library. This would not be necessary for a simpler primitive that had less (baggage) of its own.

Go back to the top

11: Add Your C File to the VM Make/Project File

So, we're almost done. Depending on your platform, you'll have to add your new C file to the VM makefile or import it into the VM build project with the appropriate C development tool (e.g., CodeWarrior). Depending on the complexity of your C code, you might also have to add additional libraries to the linker command (or import them to the interactive development tool's library list).

Go back to the top

12: Recompile the VM

Now either say "make" or use the appropriate compiler/linker tool to rebuild the VM. Make sure it recompiles the (new) interp.c file as well as your primitive C code, and that you link with any additional libraries required by your C code.

Go back to the top

13: Test It

If all of the above steps worked, you should now have a new virtual machine that includes your C primitive! You can start it with a virtual image that contains the Smalltalk side of your primitive and test it out. If this is your first foray into adding primitives, I strongly suggest that you start with a really trivial primitive (e.g., one that squares its argument or some such nonsense) to run through the process from start to finish. If you're so experienced that you don't need to, then why are you reading this note? (If you're an experienced Smalltalk programmer with lots of free time, please contact me immediately!)

Go back to the top

Notes

First off, the entire proceeding discussion is inexcusably C-centric. You can, of course, write primitives in any language that adheres to C's calling convention and can be cross-linked with C on your host platform (so why not use FORTRAN, Pascal, or ADA?).

There's another whole note yet to be written about debugging primitives, but on most platforms you can simply use the debugger to put breakpoints in the C primitive methods and single-step through them (Smalltalk will be frozen all the while, of course).

There is really no net (in terms of memory protection or "safe" primitives) here; it's quite easy to corrupt Smalltalk's heap or other memory with C, and to end up with a system that crashes unpredictable some time after you call your primitive. Be really careful about memory and stack management. Also remember the note above (in all-bold) that objects can be moved by the garbage collector between primitive calls, so if you ever pass a poiner to the VM to hold onto, you have to register it in Squeak as being external.

You can also trigger Smalltalk semaphores from C primitives; see John Maloney's SoundPlayer class or Siren's PrimMIDIPort for examples. This is by far the best way to implement "call-backs" from C to Smalltalk--have the Smalltalk application class pass down a semaphore to the VM and then start a loop process that waits for the semaphore and handles it asynchronously (If you're really clever, you can even create events and post them in Squeak's event input queue.)

For more examples: See the socket primitives for a simple interface to an external API (that passes structures around and coerces between Smalltalk objects and C structs); see the sound player primitives for examples of asynchronous I/O; see the AbstractSound classes for examples of automatically generated primitives.

Go back to the top

Epilog: Wouldn't it be nice if....

Primitives were called by name.
Glue code was generated automagically from the Smalltalk and C function prototypes.
Primitives could raise exceptions instead of failing.
We didn't need primitives at all!

Many thanks to Reinier van Loon (R.L.J.M.W.van.Loon@inter.nl.net) for the initial HTML translation of this text.

Comments are invited.

Stephen Travis Pope
Center for Research in Electronic Art Technology (CREATE)
Department of Music, Univ. of California, Santa Barbara (UCSB)
stp@create.ucsb.edu, http://www.create.ucsb.edu/~stp/

Referenser till aktuell sida

Squeak, en populär variant av Smalltalk-80 senast redigerad 2006-07-06 klockan 14.54 av 192.168.1.1
Smalltalk-programmering senast redigerad 2005-06-10 klockan 21.33 av 192.168.1.1