The Perpetual Notion: 201502

20150217

MappedByteBuffer.hurray()!: Programming the Linux Framebuffer in Java & VMFlexArray Explained

I recently travelled to Belgium to participate in FOSDEM. This year, I gave two presentations:

MappedByteBuffer.hurray()!: Programming the Linux FrameBuffer in Java & VMFlexArray Explained. See here.
Internet of #allthethings: Using GNURadio Companion to Interact with an IEEE 802.15.4 Network. See here.

The topic of this post will focus on the former. Specifically, VMFlexArray.

Currently, there is no Java Virtual Machine in existence that allows a developer to reference off-heap memory regions as Java arrays - e.g. via byte[] or int[]. That is, all java arrays that the VM deals with must be contiguously allocated at instantiation time.

What I mean by that, is that when the VM instantiates an integer array object, e.g. int[] x = new int[ length ], it typically allocates memory (now careful, I'm going to use some C teriminology here) for an object struct (2 uintptr_t in JamVM) which represents the instance of the int[] object, followed by 1 uintptr_t, which represents the length of the int[] object, followed by exactly length uintptr_t items (on a 32-bit machine) or length / 2 uintptr_t items (on a 64-bit machine) to represent the data.

VMFlexArray

VMFlexArrays are slightly different. For the same case as above, where a new int[] is allocated on the Java heap, the VM would allocate an object struct (2 uintptr_t in JamVM) which represents the instance of the int[] object, followed by 1 uintptr_t to represent the length of the int[] object, followed by 1 uintptr_t to point to the int[] data, followed by the data itself.

What makes VMFlexArrays different, and what makes them flexible (and arguably way better than what most JVMs use today) is that they include that extra uintptr_t to point to the data which could exist anywhere in virtual memory. That means, obviously, VMFlexArrays can point to contiguous data that the JVM would allocate for a regular array, but it also means that it can point to an arbitrary location - and still cooperate with the garbage collector. Indeed, the object lifecycle remains unchanged for VMFlexArrays if the garbage collector avoids releasing memory regions with free(3) if the VMFlexArray pointer does not point to the next contiguous memory address.

VMFlexArray is a solution I came up with that allows one integrate off-heap memory regions into the Java Virtual Machine - e.g. a native external thread that allocates memory using malloc(3), or pages mapped from a device such as /dev/video0 using mmap(2).

Buffer Views

Perhaps the aspect of VMFlexArrays that I found most useful, that I somehow forgot to mention during my talk, is that they rather trivially allow the following code snippet to work as expected. Specifically, an IntBuffer derived from a ByteBuffer with a backing array should be able to provide an int[] backing array view of the same virtual memory.

Currently this code, which should work pretty seamlessly, fails miserably.

ByteBuffers allow themselves to be viewed as IntBuffers or LongBuffers or ShortBuffers. Pretty brilliant! Well... if it worked it would be brilliant. The fact is, as shown above, one cannot wrap a byte[] into a ByteBuffer, view it as an IntBuffer, and then call IntBuffer.array() to get an int[] view of the original byte[]. That would make the NIO API complete, in my opinion, and this feature is sadly lacking.

With VMFlexArrays, that problem is solved.

I've even used this code to memory map the Linux FrameBuffer and animate a bunch of bouncing balls :-) It works quite well.

There's even a massive speedup associated with access to the underlying byte[] from a ByteBuffer and even more so viewing the ByteBuffer as an IntBuffer, with access to the underlying int[].

I am definitely interested in enabling these changes to make it into OpenJDK, and I feel that the community at large would benefit greatly from them. As my time is rather limited these days, I might prefer to mentor a student to make these changes in the Google Summer of Code, 2015, if OpenJDK was a mentoring organization. Otherwise, I would be open to mentoring a student under the umbrella of JamVM or GNU Classpath as a mentoring organization.

20150208

Internet of #allthethings: Using GNURadio Companion to Interact with an IEEE 802.15.4 Network

I recently travelled to Belgium to participate in FOSDEM. This year, I gave two presentations:

MappedByteBuffer.hurray()! Programming the Linux Framebuffer in Java & VMFlexArray Explained. See here.
Internet of #allthethings: Using GNURadio Companion to Interact with an IEEE 802.15.4 Network. See here.

The topic of this post will focus on the latter.

The gist of my talk was essentially that we have all of the tools available for us to quickly prototype all sorts of 802.15.4 devices. All that is needed is to integrate the following:

FreakZ: A BSD-licensed ZigBee stack (for non-commercial purposes)

Easily modified to communicate to a GNURadio device via UDP (github)
Note: this stack is not certified.

GNURadio

a great suite of tools to interact with Software Defined Radio (SDR) transceivers

GNURadio IEEE 802.15.4 Out Of Tree (OOT) module

gr-ieee-802_15_4 is available today
based on work originally from UCLA
unofficially meets all of the mandatory requirements for the IEEE 802.15.4 PHY layer
meets some of mandatory requirements for the IEEE 802.15.4 MAC layer
lacking mandatory MAC features such as

Beacon Management
Receive Beacons
Channel Access Mechanism

Carrier Sense Multiple Access with Collision Avoidance (CSMA-CA)

ACK Delivery
Security
Orphan Scanning
Store One Transaction

The primary barrier-to-entry for developers & researchers is most likely going to be the cost of an SDR. Even after buying an SDR that is capable of sampling at a sufficiently high rate around 2.4GHz, it still requires some minimal amount of investment in other 802.15.4 equipment such as ZigBee enabled thermostats, light bulbs, or gateways (I am only aware of ZigBee consumer products in the IEEE 802.15.4 market today).

To assist would-be developers in overcoming that hurdle, what I have done is simply used my USRP B200 to record real-world 802.15.4 traffic produced from an EM370 in NodeTest mode. This should easily facilitate offline signal processing using e.g. GNURadio (see File Source block), Matlab, Octave, or any other programming language. The block diagram for doing so is depicted below. I have intentionally made all of my variables directly obvious.

Keep in mind, that the the files are rather large (433 MB compressed with LZMA2) as the samples are complex-float32 and I have oversampled at a rate of 4x (8M samples per second) intentionally to better facilitate SDR receiver design. You may find them here.

The files are listed and described below:

ieee802154-channel14-txtone-complex-float32.dat

simply recording a tone at 2420 MHz in the presence of noise
note: there is a slight frequency offset which will need to be corrected

ieee802154-channel14-txstream-complex-float32.dat

a continuous random stream of valid channel symbols

ieee802154-channel14-tx-complex-float32.dat

a stream containing intermittent & full IEEE 802.15.4 frames
frames are sent once every 25500 us

A Few Notes About the Current State of IEEE 802.15.4 in GNURadio

All of the open-source PHY implementations assume that Symbol and Timing Recovery (STR) is already performed. This is fine for simulation (depicted below).

Indeed, clock recovery, frequency offset compensation, and phase offset compensation are often the most complicated part of real-world wireless receiver architectures. Without frequency compensation, the constellation diagram appears to move around the unit circle, as shown below.

For those who would like to get started quickly, you may follow the PSK Symbol Recovery tutorial. Use a Polyphase Clock Sync block, followed by the "blind" Constant Modulus Algorithm (CMA) equalizer, followed by Costas Loop. The Polyphase Clock Sync takes the taps for the matched filter as an argument, the filter length, as well as the number of samples per channel symbol, and returns a configurable number of samples per symbol with a (somewhat) frequency-corrected clock. The CMA equalizer then forces all of the samples onto the unit circle, and finally the Costas Loop corrects the phase of the signal. This approach works well enough but it has some associated complexity. The clock recovery, frequency, and phase compensation work independently. It has been suggested that using the Least Mean Squared Decision-Directed (LMS-DD) equalizer could improve performance. The Polyphase Clock Sync block can be more accurate, sacrificing time in the receive chain due to increased complexity.

However, since we already know the preamble of an IEEE 802.15.4 packet in the 2450 MHz ISM region, we are at liberty to implement a more sophisticated coherent architecture in our receiver. Specifically, due to the known preamble, we may use a Correlate and Sync block.

I will be doing a bit more experimenting in the coming days and will post an update once available.

The Perpetual Notion