A Physical Unclonable Function for any FPGA
- Theory of Operation
- Top Entity
A physical unclonable function (PUF) provides a digital fingerprint that can be used as unique identifier for security-related applications like authentication. The
fpga_puf hardware module provides an unclonable 96-bit unique identifier (ID) that is defined by the target chip's semiconductor characteristics. It is implemented in a technology-independent way that does not use any device-specific macros, primitives or attributes so it can be implemented on any FPGA (verified using Intel Quartus Prime, Lattice Radiant and Xilinx Vivado).
- technology-, vendor- and platform-agnostic implementation
- easy to use
- tiny hardware footprint (less than 200 LUTs)
- secure and reliable
Theory of Operation
Each bit of the 96-bit PUF ID is generated by an individual PUF cell. Each cell consists of an asynchronous element that is based on a simple ring oscillator. The oscillator is constructed from a single inverter that negates it's own output signal. This feedback loop is interrupted by a latch. If the latch is open (=transparent) the oscillator starts oscillating. If the latch is closed, it stores the last state of the oscillator (high or low). The latch also provides a reset to bring each cell into a defined state.
The frequency of each oscillator is defined by the mapping of the logic cell and the according routing. Both factors are constant for a specific bitstream. Furthermore, the frequency is also randomly (but constantly for a given setup) "tuned" by the chip's semiconductor characteristics. These are caused by tiny productions fluctuation (for example the capacitance / delay of a routing wire is affected by variation in oxide thickness). The frequency of each oscillator is considered to be fixed. Hence, sampling several periods within a fixed time window will always end in the same state (oscillator output is high or low). Temperature drift disturbs all oscillators in the (nearly) same way and has to be computationally eliminated/compensated by the post-processing.
fpga_puf does not use any device-specific attributes or primitives, there is a risk that the synthesis toolchain will remove the asynchronous PUF cells or collapse them into a single PUF cell ("optimize away"). Therefore, the design uses a shift register to control reset and the latch open/close phase of each cell individually and distributed over time. This concept is based on the NEORV32 TRNG and allows a platform-independent implementation.
Whenever a new sampling of the PUF ID is started, a 96+1 bit wide shift register. A single '1' is applied to the least significant bit that travels throughout the whole shift register chain during operation. The PUF cell's control signals (reset and latch control) are connected to this shift register. Each shift register bit
i controls the reset of PUF cell
i and the latch control of PUF cell
i+1. Thus, a cell's reset is active for one clock cycle. In the next clock cycle the cell's latch is opened for exactly one clock cycle allowing the oscillator to run. In the next clock cycle the latch is closed again and has captured the oscillator's last state.
When the single one-bit reaches the most significant bit (bit 96) of the shift register the sampling process is completed and the latch states are sampled into a register that provides the obtained raw PUF ID.
The following figure is taken from the Intel Quartus Prime technology viewer showing a single PUF cell (cell #44). The oscillator as well as the latch are mapped to a single LUT3 element in the FPGA. The output of this asynchronous element is sampled by a simple register. Hence, a single PUF cell only requires one LUT and one FF.
DATAA is the latch reset (high-active),
DATAC is the latch control (open when set),
DATAD is the feedback of the inverter)
Some bits of the raw PUF ID might be quite noisy, which complicates defining a stable ID that does not change over time and over a broad range of operation conditions (e.g. temperature). A lot(!) of different approaches can be found in literatures. One promising approach is the use of error-correction codes to "stabilizes" an initially determined ID.
However, these concepts are out of (my) scope so I used a simple "averaging" concept for this example. My approach samples the raw PUF ID several times (for example 4096 times) and checks how often each bit of the ID is set across all sampled IDs. If an ID bit is set in more then half of the samples it is considered to be a static
1, otherwise it is considered to be a static
0. Since a few bits of the raw ID might be quite noisy, a hysteresis is used to eliminate those bits from the final PUF: if a bit is set or cleared more often than a certain threshold it is considered "stable" in the final ID. These lower and upper hysteresis threshold define an uncertain band between them. For the final ID all bits tha fall into this uncertainty band are masked to be always zero.
My post-processing concept can be found in
sw/main.c. The source file provids more-detailed comments.
The top entity of the PUF IP module is
rtl/fpga_puf.vhd). It can be directly instantiated without the need for any special libraries.
entity fpga_puf is port ( clk_i : in std_ulogic; -- global clock line rstn_i : in std_ulogic; -- SYNC reset, low-active trig_i : in std_ulogic; -- set high for one clock to trigger ID sampling busy_o : out std_ulogic; -- busy when set (sampling ID) id_o : out std_ulogic_vector(95 downto 0) -- PUF ID (valid after sampling is done) ); end fpga_puf;
To evaluate this concept and the quality/reliability of the PUF IDs I am using the NEORV32 as processor platform. The
fpga_puf IP module is added to the processor's "Custom Functions Subsystem (CFS)", which is a blank template for implementing custom application-specific co-processors.
The setup is synthesized for different FPGAs using different toolchains (tested with Intel Quartus Prime, Lattice Radiant and Xilinx Vivado). A specific bitstream generated and programmed into several FPGAs of the same type to check for chip-specific ID variations. On one chip the ID is generated several times to check if it is "stable over time" and thus, reliable.
The application-specific PUF-wrapping CFS can be found in
rtl/neorv32_cfs.vhd. Make sure to use this CFS file instead of the default NEORV32 CFS source file when reproducing this setup.
The CFS uses four memory-mapped registers to interface the PUF ID module:
|CSF register address (C access macro)||Access||Function|
||PUF ID bits 31:0|
||PUF ID bits 63:32|
||PUF ID bits 95:64|
Bit 0 of the control register (
PUF_CTRL_EN) control the synchronous reset signal of the
fpga_puf module (
rstn_i). Setting this bit will activate the module, clearing it will put the module into reset state.
Bit 1 of the control register (
PUF_CTRL_SAMPLE) is used to start the sampling of the ID. Writing one to it will set the trigger signal (
trig_i) high for one cycle. Reading this control register bit will return the busy state of the
fpga_puf module (
sw/fpga_puf_neorv32_cfs.c. The header file
sw/fpga_puf_neorv32_cfs.h provides the function prototypes, a PUF ID data type and also the NEORV32 CFS register mappings and bit definitions.
The PUF test program (
sw/main.c) is used to sample the chip's PUF ID and transmit it via UART to a terminal program:
Physical Unclonable Functions
Test PUF implemented as NEORV32 Custom Functions Subsystem (CFS) Press any key to start PUF test (8 runs with 4096 samples each). Starting test... Run 0 ID: 0x37c0480063021011988c0095 Run 1 ID: 0x37c0480063021011988c0095 Run 2 ID: 0x37c0480063021011988c0095 Run 3 ID: 0x37c0480063021011988c0095 Run 4 ID: 0x37c0480063021011988c0095 Run 5 ID: 0x37c0480063021011988c0095 Run 6 ID: 0x37c0480063021011988c0095 Run 7 ID: 0x37c0480063021011988c0095 Test completed.
sw/neorv32_exe.bin (compiled for a minimal
rv32 NEORV32 CPU)
So far I have tested the PUF module on three Lattice iCE40 UltraPlus FPGAs and four Intel Cyclone IV FPGAs (shout-out to @emb4fun - thank you for your help!) and just one Xilinx Artix-7 FPGA. The ID was generated 8 times on each chip to "check" if it is reproducible (drawback: just in a very short time window...). The specific FPGA types are shown in section Hardware Utilization.
|FPGA||PUF ID ("Fingerprint Key")|
|Lattice iCE40 UltraPlus - 1||
|Lattice iCE40 UltraPlus - 2||
|Lattice iCE40 UltraPlus - 3||
|Intel Cyclone IV - 1||
|Intel Cyclone IV - 2||
|Intel Cyclone IV - 3||
|Intel Cyclone IV - 4||
|Xilinx Artix-7 - 1||
So far, the IDs are unique for each tested FPGA!
A long-time test is used to sample and check raw IDs (not the pre-processed ones) for stability over time. Note that my setup is placed in pretty stable environment conditions.
The test generates an initital ID
I right at the beginning of execution. In an endless loop two consecutive IDs
B are sampled (a single "Run"). To evaluate the the "instability" the Hamming distance (number of bits that are not identical across two samples) is computed. This test computes the relative Hamming distance of the two consecutive samples
H(A,B)) and also the relative Hamming distance between the initial sample and sample
H(I,A)) from the current run. Furthermore, the maximum ob both distances is computed over all runs (
To identify noisy bits an "accumulated bit-change-mask"
F is computed. This mask is computed for every run by XOR-ing the obtained samples
B with the initial ID
I. The mask from a run is OR-ed with the mask from the previous run to accumulate the noisy bits over time.
Cut-out of the test log:
... Run 43373: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9423beb156cae31b, B=0xbfbe3bfcb021beb156cae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=3, H_max(A,B)=9 - H(I,A)=19, H_max(I,A)=23 Run 43374: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9435beb156eae31b, B=0xbfbe3bfcb431beb156eae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=2, H_max(A,B)=9 - H(I,A)=21, H_max(I,A)=23 Run 43375: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9425beb156caeb1b, B=0xbfbe3bdcb421beb156cae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=4, H_max(A,B)=9 - H(I,A)=20, H_max(I,A)=23 Run 43376: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9425beb156eaeb1b, B=0xbfbe3bfcb021beb156eae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=4, H_max(A,B)=9 - H(I,A)=21, H_max(I,A)=23 Run 43377: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9021beb156caeb1b, B=0xbfbe3bdc9425beb156cae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=4, H_max(A,B)=9 - H(I,A)=18, H_max(I,A)=23 ...
This very first test was run for approx. half an hours making ~20 runs per second. It shows a maximal Hamming distance of 23 bits while 33 bits (
F) tend to be noisy (compared to the initial ID
I sampled once right at the beginning of the test.) The number of noisy bits increases slowly over time, probably because of increasing chip temperature. It tends to increase slower over time, so there might be a saturation at some point. The temperature of the PUF cells is most important because that impacts the oscillator frequencies. The PUF cells heat up due to the continuous PUF operation.
The PUF from this test setup provides 96-33=63 bits that seem to be stable over the observed time. This also means that the usable key space is reduced (63-bit instead of 96-bit).
Mapping results for the custom function subsystem (CFS), the
fpga_puf module and a single PUF cell.
|Lattice ice40 UltraPlus
||Logic Cells||Logic Registers|
|neorv32_cfs_inst_true.neorv32_cfs_inst||171 (71)||232 (35)|
|-fpga_puf_inst||100 (4)||197 (101)|
|--fpga_puf_cell_inst.fpga_puf_cell_inst_i||1 (1)||1 (1)|
|Intel Cyclone IV
||Logic Cells||Logic Registers|
|neorv32_cfs:\neorv32_cfs_inst_true:neorv32_cfs_inst||241 (43)||232 (35)|
|-fpga_puf:fpga_puf_inst||198 (102)||197 (101)|
|--fpga_puf_cell:\fpga_puf_cell_inst:0:fpga_puf_cell_inst_i||1 (1)||1 (1)|
||Logic Cells||Logic Registers|
|--fpga_puf_cell_inst.fpga_puf_cell_inst_i (fpga_puf_cell)||1||2 (Latch + FF)|
The PUF IDs are unique for each FPGA and can be successfully implemented on different FPGAs (Xilinx, Intel, Lattice).
The raw PUF ID is only partly reproducible, because some bits tend to be quite noisy (see results above).) A better post-processing algorithm using error-correction codes should be able to compensate for that. This is work in progress.
The latest test showed that there is a certain number of bits that are quite noisy so they cannot be used to determine the PUF IF. Obviously, this reduces the maximum key space (for example only 63-bit of the 96-bit ID are usable). To circumvent this, the PUF can be made arbitrarily wide (for example providing a 256-bit raw ID). Of course this will also introduce additional unusable noisy bits, but we expect that the percentage of noisy bits within the PUF ID is a device-specific constant. Each additional PUF ID bit adds 2 FFs (one for the SREG and one for the PUFF cell) and 1 LUT. For each additional bit the ID sampling time is increased by one clock cycle.
- test more FPGAs
- check stability in a controlled environment (e.g. temperature chamber)
- evaluate more sophisticated post-processing algorithms
- sample more data from more long-time test
- faulty bits over time (and temperature)
- hamming distance over time (and temperature)
- to be continued...