BLOCKCHAIN AND E2E CHAIN OF TRUST

Ethereum signatures and transactions using a hardware wallet

Understanding Ethereum signatures and transactions by creating security-focused software

Klaudiusz Dembler
DAC.digital Technology Blog
11 min readJun 10, 2020

--

Photo by Markus Spiske on Unsplash

Blockchain systems offer its users unprecedented ownership over assets (coins, tokens, etc.). As long as the user’s private key is kept secure, they are the only entity technically able to control their assets. However, this unique property is a double-edged sword — with greater control, comes greater responsibility. The lack of any central authority in blockchain systems means that if a malicious actor gains access to a private key, there’s nothing stopping them from irreversibly stealing assets it controls. That’s why keeping private keys secure is an absolute necessity when working with blockchain-based solutions. In this article, we’ll explain how Ethereum transactions are constructed and how signatures using a private key work. We’ll also explore how a private key can be kept safe using a hardware wallet.

Digital signatures

Before we get to know how to keep private keys secure, let’s take a look at how they’re actually used for authentication. Every Ethereum transaction, apart from its actual payload, contains a signature — a cryptographic proof that the transaction was created by the private key holder. A signature is unique for each (payload, private key) combination. Let’s take a look at an example unserialized Ethereum transaction:

{
"to": "0x4d6bb4ed029b33cf25d0810b029bd8b1a6bcab7b",
"gas": 21000,
"gasPrice": 10000000000,
"nonce": 175,
"value": 1,
"data": "",
"r": 952479282704265150732560892...,
"s": 314120978227539870043404053...,
"v": 120

}

You can see the signature part in the bolded text, represented by 3 values: r, s and v, everything else is the actual transaction payload. We’ll get to what these values represent later.

Node- and locally-signed transactions

Even though every valid Ethereum transaction must contain a signature (r, s and v values), you don’t have to provide it yourself — it can be automatically generated by your library of choice (e.g. web3py) or the Ethereum node itself. This choice represents the distinction between two methods of sending a transaction to a node.

The first, most straightforward approach is to let the node do the signing (generating the signature). This way you don’t need to worry too much about key management — the private key is controlled by the node and it generates a signature for every transaction you send to it. However, the ease of use is also what makes this approach problematic. First off, you can use it only with nodes that you control yourself. As we’ve mentioned earlier, private key security is crucial — that means we can’t ever share it with any third-party (e.g. hosted node services like Infura). The second problem is security-related as well — if the node will sign any transaction you send to it, anyone with access to your node will be able to sign any transaction.

These issues can be solved using the second signing approach — generating signatures as part of the transaction creation. With this method, you’d send a “raw” transaction to a node — one with an already generated signature. This way, the node acts merely as a transaction broadcaster — it doesn’t need to control any private key and can be safely shared between multiple users. However, the code that creates a transaction must also generate a signature in some way — the easiest approach is to provide your web3 library with the private key and let it sign the transaction:

Sending a transaction using a local private key

Publishing raw transactions allow us to alleviate some problems associated with node-based signing. However, the given example is far from being secure. As you’ve probably noticed, it holds the private key string as a code variable — this approach is very insecure and should never be used. It also represents a larger problem present both in node- and local-based signatures — the private key needs to be stored somewhere so that it can be used for signing. Think about it for a second — the key must be physically present on some storage and accessible by the software generating the signature. By being accessible, it’s prone to theft, in one way or another. You can use various techniques, like encryption, to try to “hide” your private key. However, if someone can get their hands on the encrypted file and passphrase, they will effectively gain access to your assets. So how can we generate signatures without the risk of private key theft?

Hardware wallets

What if we locked the private key in a box and made sure it’s not physically able to leave that box or be read? And what if that box could somehow generate a signature based on that private key without ever revealing it? Change that box into a device and you got yourself a hardware wallet. Hardware wallets are devices constructed specifically for keeping private keys secure. Their design focuses on making sure the private key is not able to leave them in any way. They usually come in with a built-in CSPRNG¹ that lets them generate safe private keys on the device itself, so they will never exist outside of it. They will also feature some kind of communication interface to transport the generated signatures.

Blockchain Security 2Go

Hardware wallets come in many different variants to suit different needs, but most popular ones connect via USB or Bluetooth. The one we’ve used for this article is Blockchain Security 2Go by Infineon and it comes in the form of an NFC-equipped smart card. Along with a compatible reader, it can become a security key that conveniently fits inside your wallet.

Blockchain Security 2Go smart card atop of an NFC reader

Blockchain Security 2Go Python library comes with a handy CLI tool that allows communication with the wallet. Let’s set up everything we need to try that out. We‘ve used a uTrust 4701 F smart card reader but any PCSC-compatible one should do the trick. We’ve done the testing on a Raspberry Pi running a fresh Raspbian, but any other Linux distro (we’ve also managed to run it on macOS) should work, as long as you install the correct dependencies.

Firstly, let’s install theblocksec2go package dependencies and the package itself.

$ sudo apt-get install -y swig python3-pip libpcsclite-dev pcscd
$ pip3 install blocksec2go

Once everything is installed, we can use the CLI tool to interact with Blockchain Security 2Go. The card comes with a number of pre-generated key pairs ready to use. Let’s get some info about the first one of them.

$ blocksec2go get_key_info 1Remaining signatures with card: 1000000
Remaining signatures with key 1: 100000
Public key (hex, encoded according to SEC1): 04da2bb6a42df116824d7ccb91b09e3f8fa2c0c94a829791468dd10cc63b9c2f9cd8ead4b201a07acc30c4a9b23d77ec9c43f7206a039418b20ae89d395308e76f

What we see here is a response generated by the card. It informs us about current usage limitation caps² and gives us the public key of the first key pair stored on the device. However, the way this public key looks like may be confusing. In the context of Ethereum or Bitcoin, we’ve come to associate public keys with addresses, the ones we use to actually interact with other accounts. What does this public key string actually represent and how do we get a usable address out of it?

Elliptic Curve Cryptography

In order to understand transactions and signatures, we need to go a little bit deeper and look at the cryptography behind them. Ethereum (and Bitcoin and many other blockchains) use Elliptic Curve Cryptography for most of its cryptographic needs. ECC, for short, is a system of public-key cryptography based on, guess what, elliptic curves. Fully explaining ECC is not a trivial task and that’s not our goal here, but if you want to get a deeper dive, take a look at this thorough blog post by Cloudflare.

An example (simplified) elliptic curve, sourced from Cloudflare blog post linked above

What’s important to understand in the context of signatures is what private and public keys actually are. The private key in ECC is just an integer of the size based on the specific ECC curve (for Ethereum that’s secp256k1 , so the private key is 256-bit long). Pick any number between 0 and ~2²⁵⁶ and you’ve got yourself a valid private key. Once you’ve got a private key, you can derive its public key. To do that, an operation called point multiplication is used, where you take a starting point based on the specific curve (called generator point) and you multiply it (with said ECC point multiplication) by the private key. The result of this multiplication is some point (so a pair of x and y coordinates) on the elliptic curve and that’s the public key.

If we take a look back on the public key returned by theblocksec2go CLI, we now know what to look for. As indicated by the command output, our public key is encoded according to SEC1 — an ECC standard. It describes a format for public keys: the first byte (04) is a prefix for an uncompressed public key (with explicit x and y coordinates), the next 32 bytes (da2bb6a42df116824d7ccb91b09e3f8fa2c0c94a829791468dd10cc63b9c2f9c) is the x coordinate and the following 32 bytes (d8ead4b201a07acc30c4a9b23d77ec9c43f7206a039418b20ae89d395308e76) represent the y coordinate.

Once we know how to interpret the public key, it’s time to generate an address. This is no longer based on any ECC specification and is blockchain-specific. For Ethereum, an address is generated by hashing the concatenated x and y public-key coordinates with Ethereum hashing function (Keccak-256), and taking the last 20 bytes. Let’s summarize our progress so far with a snippet of code generating an Ethereum address of a keypair present on our hardware wallet:

Generating an Ethereum address based on a public key

Transaction signatures

Now that we know the basics of ECC and we’ve successfully parsed a public key, it’s time to get to the signatures. The signature scheme used by Ethereum is the Elliptic Curve Digital Signature Algorithm or ECDSA for short. As mentioned in the previous section, a digital signature takes as an input the message to be signed (payload) and the private key. Again, we’re not gonna dive deep into the internal workings of ECDSA, for that please refer to another Cloudflare blog post. What you need to know though is that the result of ECDSA signing is a signature consisting of 2 integers r and s (remember those?).

As part of payload preparation, the message is hashed, so that ECDSA can always operate on payloads of a fixed size. For a valid Ethereum transaction, we must sign the Keccak-256 hash of an encoded transaction payload (the fields describing the transaction). Let’s use the following code snippet to generate a hash ready for signing:

Generating a transaction hash ready for signing

When you run the snippet, the console output will be 0x8cd381f8d4e62d1e7aa8db775cc02c178700e749c0922dd71acbc14b5e006eef— that’s the hash of our transaction that we will need to generate a signature for. Let’s do that with blocksec2go CLI using the first keypair:

$ blocksec2go generate_signature 1 8cd381f8d4e62d1e7aa8db775cc02c178700e749c0922dd71acbc14b5e006eefRemaining signatures with card: 999999
Remaining signatures with key 1: 99999
Signature (hex): 3044022063a01ee35e4946f924202a2f1c5110047d4830a02f792fbe6bdfdfbf1fb513cf022016d1d8f95f9d69a9243c2e392dfebeb8ad9a2c8e693f161b5d29bdb802a9f0ee

Once again, the first part is the usage caps that we don’t really care about. What we do care about is the second part — the signature itself. We know that the ECDSA signature consists of two integer values — r and s. Let’s take a look at the Blockchain Security 2Go User Manual² for a tip on how to decode this hex value. We can find our answer on page 21 — the signature is encoded using ASN1.DER scheme:

The encoding scheme for Blockchain Security 2Go signature

To save us some time, let’s use an ECDSA library that can decode the DER signature for us:

Decoding the DER signature using ecdsa Python library

We’ve successfully generated and decoded a signature for our transaction. We have both the r(45061880722335711026363991897050432608008727756979791100216191598749809972175) and s(10321651205677593303984547613005630911318297897278933552522297942124840743150) values. However, let’s take one more look at an example unserialized transaction from the beginning of the article:

{
"to": "0x4d6bb4ed029b33cf25d0810b029bd8b1a6bcab7b",
"gas": 21000,
"gasPrice": 10000000000,
"nonce": 175,
"value": 1,
"data": "",
"r": 952479282704265150732560892...,
"s": 314120978227539870043404053...,
"v": 120

}

As we’ve mentioned earlier and can see now, the transaction signature consists of 3 values: r, s, and v. We decoded r and s from the ECDSA signature. However, what’s v and how do we get our hands on it?

ECDSA Recovery

One more thing that you may notice when looking at an example transaction is that it’s also lacking one field we could expect to find there — from. These fields, once serialized, are the only content of an Ethereum transaction, there’s no more additional information. However, the network must somehow know who’s the transaction sender, to validate funds transfer, for example. To do that, a process called ECDSA Recovery is used. It allows recovering the signer’s public key when supplied with the signature and the original message. Thanks to that, the sender doesn’t have to be explicitly mentioned in the transaction and can be recovered directly from the signature.

ECDSA Recovery comes with a quirk though, it doesn’t provide the full public key of the signer, instead, it returns only the x coordinate of it. One thing about ECC curves we haven’t mentioned is that they’re horizontally symmetrical. That is, for any given x value, there are two possibley values, at the same distance from X-axis. So after performing ECDSA Recovery we are left with 2 possible public keys that could have created the signature. We could then verify the signature for one of them to make sure which one is the signee. However, that’s some extra work just to figure out the sender. Instead, to avoid ambiguity, Ethereum uses a separate value to indicate which of the two public keys is the sender — v. Historically, v would be either 27 or 28 — indicating first or second public key from ECDSA Recovery result respectively. However, starting at Ethereum block 2,675,000 with the introduction of EIP-155³, v can also indicate the specific Ethereum chain the transaction is intended for, with the value of CHAIN_ID * 2 + 35 or CHAIN_ID * 2 + 36.

Down to the business

All right, we now have all the puzzle pieces necessary to create and broadcast an Ethereum transaction using a hardware wallet. We know what to do to get an ECDSA signature and how to calculate the v value. Let’s put this knowledge to use. The following code snippet summarizes everything we’ve learned. Please note that it omits some of the used functions for the sake of clarity.

Creating, signing and publishing an Ethereum transaction with a hardware wallet

Summary

In a space where the loss of a private key means irrevocable loss of the asset it controls, we should do everything we can to ensure the highest level of security, and hardware wallets offer one hard to match by any other solution. Despite their advantages, hardware wallets require a level of deeper understanding when it comes to cryptography and signatures. We hope this article can serve as a knowledge base for anyone looking to understand Ethereum a little better. We’ve used a specific hardware wallet as an example, but the information in this article should be universal to any solution.

We’ve also created a Python library that allows easy interaction with Blockchain Security 2Go available at https://github.com/rexs-io/blocksec2go-ethereum. It uses the same methods mentioned in this article and can serve as a base for interaction with any hardware wallet.

Have fun with it and let us know if you have any questions in the comments!

--

--

I enjoy understanding things. Writing about blockchain and frontend development