Operational Code Authentication

From distributed.net
Jump to navigationJump to search

This document was originally written and prepared by Jeff Lawson. Please submit comments and suggestions regarding this document's content directly to him.

Background

Although distributed.net has long published almost full source code to its client applications, clients compiled from the public source cannot interact with the networked keyservers, and can only be used to benchmark and test the contest checker cores. Because of this, distributed.net is frequently criticized for this practice, however the people making these accusations rarely understand the full technical implications behind our policy. In reality, distributed.net fully understands the importance and advantages of being open source and is committed toward working toward fully accomplishing this eventual goal. The following list represents some of the major motivating reasons that prevents distributed.net from fully publishing full source code for a fully functional client:

  • there exist people who would want to modify the client to purposely report unchecked blocks as being completed for the purposes of raising their statistics.
  • others may want to ruin the project for everyone by intentionally opening the possibility of bypassing the solution key.
  • the project could additionally be spoiled by attempting to overwhelm the keyservers by falsely submitting fake solutions or fake completions.
  • although unlikely, it is possible that home-made binaries contain coding mistakes or compilation errors. Admittedly, binary-only clients released by distributed.net could also contain errors, but the chances of such an error remaining hidden for long are virtually zero, and once such an error is discovered by us, we can confidently exclude blocks reported by specific binary versions of the client that are known to be defective.

Quite truthfully, releasing binary-only clients still does not completely eliminate the possibility of sabotage, since it is relatively easy for any knowledgeable person to disassemble or patch binaries. This is actually quite a trivial task, so we urge you not to try. Indeed, security through obscurity is actually not secure at all, and we do not claim it to be such.

It would be easy for distributed.net to rewrite the simple data obfuscation that is currently done for all buffer files and network traffic, and replace it with something more complex (such as using some public-key encryption or a real block-cipher for data packets). However, doing that would effectively be useless and provide no additional benefit, since it adds no actual mathematical security in our usage model (see the below sections for explanations of why). Any significant architectural changes that are adopted by distributed.net in the future must provide improvements in this area or else there is no gain by using it. Until distributed.net can identify a protocol that provides a secure implementation (that allows the source code to be fully public without endangering the ability to detect and discard results from fraudulent clients), at least a portion of distributed.net's client source code must remain closed to distribution.

The possibilities listed above are not merely unlikely scenarios. In fact they have all already occurred in some limited form or another. Our original "v1" clients were readily available in full source code form, but there were many instances of individuals making intentional modifications to the client code to attempt to nullify the entire contest or gain astronomical stats placement for themselves. Intentional spamming in the current "v2" clients does exist, but it is typically restricted to resubmitting the same blocks that were checked once but submitted many times, which we can automatically detect and ignore.

There are currently a few technical issues that prevent the existence of a fully open-source distributed.net while still allowing us to verify/ensure the operational capacity of all clients. Some of these major issues are highlighted here, as well as some of our evaluations of these issues. There are many other remaining issues that usually are classifiable as a subset of one of these problems. It is highly suspected that all of these problems are not fully solvable in a fully secure manner.

Operational Code Authentication

We have programs that perform a known behavior (test all possible keys within a sequence), and produce output that is dependent upon the known behavior (out buffer files). It would be desirable to be able to determine if the output of a given instance of our program that was running on an "untrusted host" was in fact generated by an instance of our program with the same behavior.

When running on an "untrusted host", it cannot be guaranteed that a given program is running without modifications to it that could alter its expected behavior in a way that could produce valid-but-inaccurate output.

For distributed.net's clients, valid-but-inaccurate output would mean that the client would claim to have completed the work assigned to it, when it has in fact not done so at all. It would be desirable to have tolerance of a limited scope of modifications to the client, such as allowing cosmetic or functional changes that do not alter the work-completion expectation.

Unfortunately, since the code is running on an untrusted host, you cannot rely upon safety measures that are implemented entirely on the client side, since the behavior of this checking can be altered or completely bypassed by modifying the code. Examples of types of checks that this includes are:

  • Runtime checksum computation of the executing binary. These computations can be modified such that the computation is not actually done, and instead a pre-computed constant is used (determined by observing the behavior of a valid client). Even if you assume that the checksum verification cannot be bypassed (which is not possible to ensure), nothing prevents a clever individual from dynamically patching the binary in-memory once the checksums have been done. Note that this category also includes cryptographic code-signing or signature verification techniques.
  • Binary obfuscation. There are many techniques to discourage reverse-engineering attempts on binaries that are done at runtime, such as self-decompression, self-decryption, self-checksum verification, anti-debugger roadblocks, and others.
  • Operational self-checks. Attempts by the client to actively "test" execution paths to compare the output of known test-cases are ineffectual, since it is possible to do any of the following: alter the self-check harness to ignore or not perform the test results; alter the self-check harness to unknowingly call a duplicated copy of the affected functions that have not been modified; alter the affected functions to detect self-check invocations and return valid test results.
  • Cryptographic signing of transmitted data. This method attempts to distinguish an "official" version from non-official versions ("official" versions contain an embedded "official" private key for the data signing). Although this permits identification on the remote end, the distributed binaries must contain a private key, which is admittedly not going to be very private anymore. Nothing prevents tampering with an official binary to make it unknowingly sign invalid data that it produces. Also, nothing prevents the private key from being manually extracted from an official binary (via reverse engineering) and inserted into a non-official binary. Obfuscation attempts to hide the private key are inadequate, since an official binary eventually must derive the private key during its operations.
  • a combination of the above, such as using the result of a self-check as a seed/key to decrypt a final portion of the code at runtime, will also be ineffective. There are also many other possibilities.

The open-source networked game "Netrek" allows for trusted binary-only distribution of "RSA-blessed" clients so that cheating behavior can be eliminated/reduced. Although anyone can compile a fully operational client using the publicly distributed source, it will not be RSA-blessed and some public Netrek servers may have been configured to disallow unblessed clients. What differentiates a blessed client is a compile-time directive and the inclusion of a private RSA (or sometimes PGP) key within the compiled binary. The corresponding public half of the key must then be given to each Netrek server operator so that it can be added to his server's list of explicitly allowed clients. Server operators accept public keys based on their perceived trustworthiness of the person compiling the client to keep the private key confidential. The generated public keys have an embedded expiration date (typically 6 months or so) that is verified by the server on client connect, after which point expired clients are rejected. Each new client revision that an author compiles and releases uses an independent private/public key pair. By using independent keys for each revision, when a specific client version is detected to be compromised, that specific client public key can be manually removed from the explicit-allowed list on all servers. Note that because private keys must be embedded within the blessed client binaries, it is a simple task to extract the key with a debugger/disassembler so that it can be reused for the compilation of a cheating client. Also, nothing prevents tampering with a blessed binary to make it perform cheating actions, also requiring use of a debugger/disassembler. The short-duration expiration also limits the significance of a single compromised private key and the practicality of replicating dis-assembly tasks for every expiration period.

Code Authentication via "Duplicated Verification"

A possible method to work around the Operational Code Authentication problem that would be usable for secure detection of tampered clients involves duplicated work verification. In this proposal, instead of distributing distinct work assignments to each client, some amount of duplicated assignments will be performed. Then, the returned results from the multiple clients are compared for equality, and rejected if there is a discrepancy. This method makes a number of assumptions:

  1. deterministic output: legitimate clients will always produce the same output given the same input.
  2. computation cannot be short-circuited: the output of legitimate clients is dependent upon fully completing the computational process and the output cannot be (reliably) produced through any other (less computationally intensive) process. This also requires that the output have sufficient distinctiveness that it cannot be probabilistically predicted.
  3. legitimate majority: the chance of selecting two non-legitimate clients (with the same incorrect output) is not significant. If the results from successive re-evaluations from the same participant are correlated, the odds of cheating diminish exponentially with each unit of work re-evaluated.
  4. indistinguishable verification: it cannot be possible for a client to identify whether it is receiving an "original" work unit or a duplicately assigned work unit. Not only is it unnecessary for clients to be able to know, but also it would permit an illegitimate client to detect such work unit and selectively process them using the legitimate method.

Currently, distributed.net's clients return only a Boolean indicator (whether a possible solution was found, with True results being extremely rare), which breaks assumption #2. This lack of distinctiveness in the typical client output makes it impossible to attempt to use independent result verification as a method of checking. For duplicated verification to be a viable possibility for distributed.net, our existing clients must be made to produce additional distinctiveness in addition to the Boolean result.

Distributed.net has made several evaluations for possible ways to introduce the generation of distinctiveness into our RC5-64/DES/CSC cores by collecting "residual" data, however all methods we have considered has introduced a significant overhead (an estimated 15% slowdown in client key rate has been predicted). The most viable method that we have currently considered is to keep a running CRC32 value of the least-significant-byte of each resulting plain-text word.

Using only the least-significant-byte is probably sufficient for simple Duplicated Verification, since the cryptographic algorithms of RC5/DES/CSC work on a minimum of 32-bit words (which is all that d.net clients verify). Typical table-lookup CRC32 implementations require a pass for each 8-bit piece and utilizing the entire 32-bit plain-text output would require 4 iterations to process each of the 8-bit parts.

For OGR, a good "residual" data value for comparison is the number of nodes searched within the requested stub. This value is already passed back up to the master and logged in the master log files. Only problem is this value is highly dependent on the core implementation - better algorithms search fewer nodes by better pruning.

Other points to keep in mind and consider for the use of Duplicated Verification specifically for distributed.net include:

  1. It may also be necessary to consider only computing a CRC32 not on every plain-text value, but on some periodic interval so that the performance loss can be acceptable (however this would mean that it would be possible for a non-legitimate client to produce legitimate results by computing only every 1/xth key).
  2. Care must also be taken to ensure that compatible techniques for the computation of the "residual" data are used across all platforms/cores (alternate bit slicing techniques may vary the order in which keys are checked).
  3. Any introduction of work duplication will result in an overall slowdown in combined network key rate, however it has been proposed that client statistics not be penalized for running a legitimate client that happens to process a duplicated work unit.
  4. There are additionally problems introduced by the fact that the sizes of RC5/DES/CSC work units can be subdivided into smaller units when a client requests a smaller block from a server or personal proxy. Attempting to redistribute a large block for re-verification may result in several sub-blocks being checked, possibly by multiple different clients. It will be necessary to keep this in mind and either enforce a requirement that re-verification blocks not be subdivided by full servers and personal proxies (while simultaneously masking such non-divisible information from clients), or use CRC32 on the smallest sub-divisible size and a simpler commutative operation on each of the resulting CRC32 results.
  5. Detection and verification of duplicated work will likely need to become a post-processing activity because of the increased online data storage requirements of storing and looking up the residual data for every work block processed (without Duplicated Verification, only 2-bits of storage is currently required for block tracking).

Note that it is not necessary to distribute all work units multiple times. Duplicated distribution can occur on a relatively infrequent period if it is acceptable for delayed detection of illegitimate clients, since only a single failed Duplicated Verification is necessary to suspect a client of faulty work. Once an illegitimate client is suspected, all previously submitted work units by that host/email could be re-verified and discarded, if necessary.

The SETI@Home project uses a type of Duplicated Verification to detect forged work.

GIMPS also uses a method of Duplicated Verification based upon the result of the LL sequence (a huge binary number), the lower 64 bits of which is called the "residue"; and is used as the verification data. To prevent submitting bogus first-time results, the official clients are compiled from the public source with the addition of a bit of checksum code that validates the result. If someone were to reverse engineer the checksum code, his or her bogus results would show up sometime later in double-checking. If someone were to submit an abnormally large number of results, there would probably be some spot double-checking going on at the same time (ahead of the normal double checking location). In fact, the entire residue is not kept secret--56 bits of it are public and the lowest 8 bits are kept secret. So it's still possible to do things like running your own independent double checks, checking the public 56 bits. If you submit a double check residue and all 64 bits match, then everything is made public. If the double check residue doesn't match, a triple check is performed. Sometimes none of the three residues of a triple check match--just keep going until you find two that match. The nice thing about the LL test is the final residue is independent of the method you used to get there.

Code Authentication via "Trusted-user Signatures"

Under this proposal, the trust of the code is not what is being authenticated, but actually the trust of the user. Before a user starts using a client, he must obtain a digital private key (for example, by completing an agreement on a web page that generates a private key). All clients deployed by a person must be configured with a valid private key or else if refuses to operate (fails to start if the client test of the general private key validity fails, and fails to transmit the results to the server if the private key is unregistered/unknown). Under this system, the trust and authentication is effectively delegated to the user and the user becomes responsible for not violating the trust of the client. In this environment, if client forgery is ever detected, reported blocks that have been signed by that user can immediately be dismissed (or selectively re-verified through some secondary mechanism, if desired). Note that in this mechanism, there must still be some alternate method to allow forged results to be detected--this scheme only provides a way of easily revoking all results from a user that has been determined to be untrusted. Schemes to define trust inheritance from sub-derived user signatures could potentially be devised, which would allow sub-authorities, reducing the overhead of having a single authority to generate private keys.

Code Authentication via "Simultaneous Problem Minimization" (Inline Testing)

A solution that Dan Oetting proposed ("inline testing" he calls it) involves having the client simultaneously solve two problems (using the decrypted key results for each pass to not only do an solution equality comparison, but also a constraint minimization). The results communicated back to the server by his clients would not only send back the Boolean result (and the solution if True), but also the result of a separate constraint minimization problem. Since it is a minimization problem, you must evaluate all keys in the block to ensure that you consider all possibilities. Skipping any will possibly invalidate the result of the minimization, which can be detected by the server once the same evaluation is re-performed. These are some points that are significant:

  • The criterion that is used to decide what function the clients should minimize must be carefully selected such that its prediction is difficult (for example, taking the simple minimum of each resulting plain text would be non-ideal, since it would be of high likelihood that "0" would be the end result).
  • For cipher-key based contests, one possibility is to not "minimize" but test for an alternate equality with another plain-text match that has been artificially selected such that it is known to exist somewhere within that block. In this case, the server must track the secondary plain-text that it selects and verify the result in the returned block, to ensure that the client did not forge a fake secondary plain-text to match an arbitrary key that it has selected.
  • If block-splitting is still a desirable feature to have, it must be ensured that the results function minimization will not become invalid when the block is split by a keyserver. Alternatively, you can ensure that any server that will be doing block-splitting must take into consideration the minimization function of the block, and possibly recomputed the minimization request data that will be supplied to the eventual client for testing. If all servers that can do block-splitting cannot be trusted (ie: personal proxies), then it may be necessary to define a predictable transformation to the minimization request data and allow a trusted server to additionally verify that the resulting minimization test was appropriately derived by the untrusted server(s).
  • Oetting's "inline testing" method involves a variation in which the selection of the minimization function allows the receiving keyserver is able to immediately use returned residual data to verify the work done, without requiring complete re-evaluation of the work packet to be done. This property eliminates the need of redundant distribution of the same work units.
  • If work verification must be done through redundant distribution, then it will be important to ensure that the residual values (being compared for equality) are from actually different machines/clients. Otherwise it would be possible for a single malicious client to submit the same work unit (with same residual value) multiple times, perhaps attempting to obscure other identifying information to emulate the appearance of originating from different machines/clients. Therefore, it is necessary to track distinct assignments of blocks. One solution to this is as follows:
  • Allow the keymaster to generate opaque tickets that are included with each work unit as it is generated and distributed. The ticket is computed by taking a secure hash of the concatenation of the following information: work unit number, work unit length, secret constant, and a random salt factor.
  • Clients will process work packets without modifying the opaque ticket, which follows the completed work packet when it is transmitted back to the keymaster.
  • When the keymaster receives a result work packet, it verifies that the ticket in the block is valid, which it can easily do because it knows all of the information in the hash (except for the random salt factor). If the ticket fails to correspond with the work packet, the packet is invalid and can be discarded. Otherwise the work unit number, work unit length, residual value, and the ticket value itself are recorded in a re-verification database.
  • To verify duplicated work block entries that have been stored in the re-verification database, locate two entries for the same work unit number and work unit length. Ensure that the residual values of all matched records are identical. Finally, ensure that at least one of the matching records has a differing ticket value. It is possible for all records except one to have the same ticket value (meaning that the same residual was derived by at least two different machines). Records that have the same ticket value indicate possible duplicated submission (if care was made to ensure that different salt values were always used; if that was not done, then that cannot be ensured).

Prevention and Detection of Solution Transmission Suppression ("Solution Stealing")

It is possible for a dishonest user to modify the client so that when a solution is detected, instead of transmitting the solution to distributed.net for final verification and allowing the user to deprive distributed.net (and potentially the winner's team) of the cooperative solution attribution and award. There are several possible ways that this issue can be addressed:

  • Solution detection through expiry. If it is possible to realize after some expiry that a block has not been reported in (or was reported in as a forged no-solution result), then the block can be recycled and redistributed repeatedly until it is given to a valid client that successfully reports the presence of a solution within the block. Of course, in the case of most 3rd party timed-contests, if an un-honest client knowingly withholds the solution from distributed.net, they will most likely report the solution to the 3rd party contest sponsors, limiting the ability for distributed.net to recover the solution through expiry methods.
  • Solution obfuscation through problem transformation. Send plain/cipher/iv data to the client that has been modified such that an identified solution/non-solution in the transformed problem has a relevant result original problem. Therefore, if the mathematical transformation between the altered problem and the original problem-set is kept confidential, an identified solution in the transformed problem is useless to a dishonest client. Although it is still possible for a modified client to not transmit the solution back to distributed.net, knowledge of the existence of the solution is not useful to the person. However, due to the nature of typical cryptographic algorithms and their design goals to prevent such efforts, this type of homologous problem transformation is typically not possible.
  • Solution obfuscation through multiple-dependencies. In some classifications of problems, it may be possible to design problem dependencies such that the individual clients work toward identifying only a partial solution, and knowing the solution that is identified by any single machine is not useful unless all of the results are also known and combined in some manner. Unfortunately, this method is not suitable for all applications. However, OGR is a class of problems that falls under this classification, since each client works to determine the best solution within a given problem stub, and the final solution is assembled by taking the best of all results (the accompanying proof is supplied by providing verifiable evidence of an exhaustive evaluation of all stubs having a less optimal solution).

Prevention of Local Storage File Tampering

It would be desirable for our programs to be able to read and write data in flat binary files, but write them in such a way that it can be ensured that no modifications could have possibly been made to the file by the user or by an untrusted program. It will always be possible for modifications to be made to the file, but it should always be possible to detect such changes and ignore/discard them.

Don't forget that the program must be able to read and modify its files. Therefore, the algorithm for making sense of them and changing them properly is embedded in the program. A determined attacker is always able to reverse-engineer these algorithms and implement them on his own to achieve the desired results.

Adding cryptographic signing to the file for the purposes of detecting tampering is not sufficient, since the program needs to be able to perform legitimate modifications to its files and an illegitimate client could replicate any file operation that can be performed by a legitimate client. Embedded private keys within the client binary can be extracted through manual techniques and reused in illegitimate programs.

Note that if it is possible to solve the Operational Code Authentication problem and convince a central key server of its authenticity, it might be possible to use the central key server for a notarization authority to allow it to cryptographically sign data records that are being added to a data file. The public key of the notarization authority can be used to re-verify the legitimacy of data records within the file. Of course, this technique would require a more permanent level of network connectivity, which is what the use of local storage files attempt to eliminate the necessity of.