Nostr NIP 111 - Version 1

Motivation

The open source movement has always been a beacon for collaboration, innovation, and freedom in software development. However, the centralized nature of many current code collaboration platforms poses a fundamental challenge to these principles. Centralization limits the scope of collaboration and compromises the digital sovereignty of developers and contributors. In a centralized system, the platform essentially owns your code, data, and interactions, which is antithetical to the core principles of open source development.

The transformative power of decentralization and digital sovereignty in open source development is revolutionary. Decentralization eliminates the single point of control, empowering individual developers and communities with full ownership and control over their code and contributions. Stopping centralized power is not only a technical requirement but also a moral imperative. Digital sovereignty ensures that developers can control their data, code, and interactions, which aligns perfectly with the open source ethos.

To truly realize decentralization and digital sovereignty, one of the most promising ways is to extend Git to support peer-to-peer (P2P) interactions. This way, we can create a P2P-based Git that inherently supports decentralized code translation. However, it's crucial to understand that more than having distributed code transmission capabilities is required. To fully realize the ideals of digital sovereignty and decentralized open-source collaboration, we must also implement mechanisms for distributed collaboration.

Distributed collaboration goes beyond just sharing code; it encompasses issue tracking, pull requests, code reviews, and community governance. In a decentralized environment, these aspects of collaboration should also be distributed across the network, allowing developers to engage in these activities without relying on a central authority.

Customization of the Git P2P Transfer Protocol in Mega

Mega is an engine for managing a monorepo. It works similarly to Google's Piper and helps streamline Git and trunk-based development for large codebases. Meanwhile, Mega is trying to adapt a p2p transfer protocol to bring distributed data interaction to Git.

The Protocol URI of Git P2P

The original Git protocol syntax is

[<protocol>://]<username>[:<password>]@<hostname>[:<port>]/<namespace>/<repo>[.git]

For a P2P Git protocol

  1. <protocol> could be an prefix like p2p:// to indicate the P2P protocol.
  2. <username> and <password> is unnecessary for P2P protocol. In implementation, we reference the Git SSH protocol interaction commands and use the peer ID for authentication, so we don't need a username and password.
  3. The <hostname> usually represents the server, but in P2P, it maps to the peer ID hosting the repo. We use <peerID> here to avoid confusion.
  4. The <port> will not be relevant for p2p networking.
  5. The mega uses mono repo, so there are no <namespaces> or <repo> names, only <path>. We could design a virtual path scheme to map directories to exposed public paths privately.

The Git version control system uses two major transfer protocols: "dumb" and "smart." The dumb protocol is simple but inefficient, requiring a series of HTTP GET requests. It is rarely used today due to its limitations in security and efficiency. On the other hand, the smart protocol is more common and efficient, as it allows for intelligent data transfer between the client and server.

Inspired by Git's approach to having multiple transfer protocols, we add a type segment in the custom P2P Git transport protocol. This segment allows us to specify the format of the files being transferred between peers, similar to how Git's protocols specify the nature of the data transfer. Currently, the type segment supports two formats: pack and object.

  1. pack: Indicates that the file being transferred is in Git's Pack format, efficiently transferring multiple Git objects.
  2. object: Indicates that the file being transferred is in Git's Object format, suitable for transferring individual Git objects like blobs, trees, commits, or tags.

Finally, the P2P protocol URI looks like

p2p://<peerId>/<type>/<repo>

Example

p2p://12D3KooWFgpUQa9WnTztcvs5LLMJmwsMoGZcrTHdt9LKYKpM4MiK/pack/mega.git

or

p2p://12D3KooWFgpUQa9WnTztcvs5LLMJmwsMoGZcrTHdt9LKYKpM4MiK/object/be044281f9604305e1b41b0e800e844c2a417e52

The DHT(Distributed Hash Table) Storage Specification

DHT(Distributed Hash Table) offers a compelling solution for enhancing the scalability and robustness of the Mega. DHTs are decentralized systems that can store key-value pairs across a network of nodes, eliminating the need for a central server. So, we use DHT to store information about open source repositories in the network in each peer node. How it work

In a DHT-based Git network, each repository or "repo" name would be a unique key. The key(repository name) can only contain ASCII letters, digits, characters ., -, and _. The value is a specification:

{
  "origin": "12D3KooWFgpUQa9WnTztcvs5LLMJmwsMoGZcrTHdt9LKYKpM4MiK",
  "name": "mega",
  "latest": "1de1c6f",
  "forks": [
        {
            "peer": "456DFooWFgpUQa9WnTztcvs5LLMJmwsMoGZccdw3Ddf3DTH23", 
            "latest": "1de1c6f",
            "timestamp": 1629827281
        },
        {
            "peer": "799DFoodjsfhuedDFEDSFesDFwefSDfwsefEWFweSDFWEfweS", 
            "latest": "be04428"
            "timestamp": 1629827281
        },
  ],
  "timestamp": 1629827281
}
  1. origin: This field contains the PeerID of the original node where the repository was first made open-source.
  2. name: This is the name of the repository.
  3. latest: This field holds the SHA-1 value of the latest commit made to the repository.
  4. forks: This is an array containing objects representing the original repository's forks. Each object has its own set of fields.
  5. peer: The PeerID of the node that has forked the repository.
  6. latest: the SHA-1 value of the latest commit made to the forked repository.
  7. timestamp: A Unix timestamp indicates when this fork's latest update was made.

The Implementation of Git Peer-to-Peer Interactions

Delivery pack file Between two peers

Delivery pack file Between two peers

sequenceDiagram 
    Client A->>+Relay: Hello Relay. I am joined! 
    Relay-->>+Client A: DHT Broadcast 
    Client A->>Client A: Put mega to DHT 
    Client A-->>+Relay: DHT Broadcast 
    Client B->>+Relay: Hello Relay. I am joined! 
    Relay-->>+Client B: DHT Broadcast 
    Client B->>+Relay: Help me connect to Client A 
    Relay-->>+Client B: Tell Client A the public IP address and port 
    Relay-->>-Client A: Tell Client B the public IP address and port 
    Client B->>+Client A: I need to clone: p2p://<Client A peerID>/pack/mega.git 
    Client A-->>+Client B: It's your pack file. 
    Client B->>Client B: Put mega to DHT
    Client B-->>+Relay: DHT Broadcast 		
    Relay-->>+Client A: DHT Broadcast

Collect object files through multiple peers

Collect object files through multiple peers

sequenceDiagram 
    Client A->>+Relay: Hello Relay. I am joined! 
    Relay-->>+Client A: DHT Broadcast 
    Client A->>Client A: Put mega to DHT 
    Client A-->>+Relay: DHT Broadcast 
    Client B->>+Relay: Hello Relay. I am joined! 
    Relay-->>+Client B: DHT Broadcast 
    Client B->>+Client A: I need to clone: p2p://<Client A peerID>/pack/mega.git 
    Client A-->>+Client B: It's your pack file. 
    Client B->>Client B: Put mega to DHT 		
    Client B-->>+Relay: DHT Broadcast 		
    Relay-->>+Client A: DHT Broadcast 
    Client C->>+Relay: Hello Relay. I am joined! 
    Relay-->>+Client C: DHT Broadcast 
    Client C->>Client A: I need an object list 
    Client A-->+Client C: It's the object list 
    loop [Ojbect List Part A] 
    Client C->>Client A: I need an Object <SHA-1> 
    Client A-->Client C: It's your object file 
    end 
    loop [Ojbect List Part B] 
    Client C->>Client B: I need an Object <SHA-1> 
    Client B-->Client A: It's your object file 
    end
    Client C->>Client C: Put mega to DHT
    Client C-->>+Relay: DHT Broadcast

Why Choose the Nostr for Collaboration

Git is a foundational platform for version control, playing a pivotal role in open source collaboration. Open source collaboration can be segmented into two primary parties: data transfer and information dissemination. The robust versioning capabilities of Git facilitate data transfer, while information dissemination, including updates, announcements, and collective communications, is the component that can be efficiently extended using the Nostr protocol.

Nostr uses events as the atomic unit for its protocol. These events are the only object types on the Nostr network and can be of various kinds, such as "text notes" intended for Twitter-like feeds, replies, and comments. Each event contains an id, pubkey, created_at timestamp, kind, content, tags, and a sig for signature. The kind specifies the type of event, and the content depends on what the kind means. For example, in the case of kind:1, the content is just a plaintext string meant to be read by others. While Nostr's event-based mechanism is unsuitable for transmitting code data due to its design for short, plaintext notes, it is highly effective for broadcasting other information relevant to open source collaboration.

Nostr Implementation Possibilities (NIPs) are designed to promote interoperability within the Nostr network. Aside from the first NIP, NIP-01, which outlines the basic protocol, all other NIPs are optional. NIPs aim to coordinate implementing solutions that are compatible across different applications. They are essential in a decentralized network like Nostr, where the community determines the direction of the protocol. NIPs provide a structured yet flexible framework for enhancing event-based broadcasting and subscription mechanisms in open-source collaboration.

Nostr NIP Proposal

This Nostr Implementation Possibility (NIP) aims to establish a standardized protocol designed explicitly for open-source writing collaboration. It outlines the essential structures and flows that should be universally implemented, ensuring compatibility and effective cooperation among contributors. Future NIPs may extend this foundational protocol by introducing optional or mandatory fields, messages, and features.

Basic Structures and Flows

{
  "kind": 111,
  "id": <32-bytes lowercase hex-encoded sha256 of the serialized event data>,
  "peer": <32-bytes lowercase hex-encoded public key of the event creator>,
  "timestamp": <unix timestamp in seconds>,
  "tags": [],
  "content": <arbitrary string>,
  "sig": < 64-byte lowercase hex of the signature of the sha256 hash of the serialized event data, which is the same as the "id" field>
}
  1. kind: A unique identifier for the type of event, set to 111 for open source writing events.
  2. id: A 32-byte lowercase hex-encoded SHA-256 hash of the serialized event data, serving as a unique identifier for each event.
  3. peer: A 32-byte lowercase hex-encoded public key of the event creator, identifying the contributor.
  4. timestamp: A Unix timestamp in seconds, indicating when the event was created.
  5. tags: An array of tags that provide additional context or categorization for the event.
  6. content: An arbitrary string that contains the actual content or details of the event, such as a new chapter, edit suggestions, comments, etc.
  7. sig: A 64-byte lowercase hex-encoded signature of the SHA-256 hash of the serialized event data should match the "id" field to ensure data integrity.

Update a repo status event

This event broadcasts a project's open source or an open source project's last commit update, and the subscriber determines whether to clone or pull the library to update it.

{
  "kind": 111,
  "id": <32-bytes lowercase hex-encoded sha256 of the serialized event data>,
  "peer": <32-bytes lowercase hex-encoded public key of the event creator>,
  "timestamp": <unix timestamp in seconds>,
  "tags": [
    ["p", "12D3KooWFgpUQa9WnTztcvs5LLMJmwsMoGZcrTHdt9LKYKpM4MiK"],
    ["n", "mega"],
    ["t", "origin"],
    ["a", "update"],
    ["u", "p2p://12D3KooWFgpUQa9WnTztcvs5LLMJmwsMoGZcrTHdt9LKYKpM4MiK/pack/mega.git"],
    ["c", "1de1c6f"],
  ],
  "content": <arbitrary string>,
  "sig": < 64-byte lowercase hex of the signature of the sha256 hash of the serialized event data, which is the same as the "id" field>
}
  1. p: The peer id of the node
  2. n: The name of the repo
  3. t: The type of repo - origin or fork
  4. a: The action of event - update/request/issue
  5. u: The p2p URL of the repo
  6. c: The latest commit of the repo

Create a merge request

This type of event broadcasts the version of a project Fork, requesting that upstream be updated with this last commit.

{
  "kind": 111,
  "id": <32-bytes lowercase hex-encoded sha256 of the serialized event data>,
  "peer": <32-bytes lowercase hex-encoded public key of the event creator>,
  "timestamp": <unix timestamp in seconds>,
  "tags": [
    ["p", "12D3KooWFgpUQa9WnTztcvs5LLMJmwsMoGZcrTHdt9LKYKpM4MiK"],
    ["k", "mega"],
    ["t", "fork"],
    ["a", "request"],
    ["u", "p2p://12D3KooWFgpUQa9WnTztcvs5LLMJmwsMoGZcrTHdt9LKYKpM4MiK/pack/mega.git"],
    ["c", "1de1c6f"],
  ],
  "content": <arbitrary string>,
  "sig": < 64-byte lowercase hex of the signature of the sha256 hash of the serialized event data, which is the same as the "id" field>
}

Create an Issue

This type of event broadcasts an issue; it is up to the upstream or fork to determine if it should be tracked.

{
  "kind": 111,
  "id": <32-bytes lowercase hex-encoded sha256 of the serialized event data>,
  "peer": <32-bytes lowercase hex-encoded public key of the event creator>,
  "timestamp": <unix timestamp in seconds>,
  "tags": [
    ["p", "12D3KooWFgpUQa9WnTztcvs5LLMJmwsMoGZcrTHdt9LKYKpM4MiK"],
    ["k", "mega"],
    ["t", "fork"],
    ["a", "issue"],
    ["u", "p2p://12D3KooWFgpUQa9WnTztcvs5LLMJmwsMoGZcrTHdt9LKYKpM4MiK/pack/mega.git"],
    ["c", "1de1c6f"],
    ["i", "Issue Content"]
  ],
  "content": <arbitrary string>,
  "sig": < 64-byte lowercase hex of the signature of the sha256 hash of the serialized event data, which is the same as the "id" field>
}

Why the Mega is Well-Suited for Implementing Git's Peer-To-Peer Functionality

The Mega presents a compelling case for being an ideal platform to implement Peer-To-Peer (P2P) functionality for Git.

Difficulty in Modifying Git for P2P While Maintaining Upstream Compatibility

Modifying the existing Git architecture to support P2P functionality is challenging, especially if the goal is to maintain compatibility with upstream repositories. Being a standalone project, Mega offers the flexibility to implement P2P features without worrying about breaking existing Git functionalities.

Requirement for Host Service Capabilities

A distributed collaboration for a repository typically requires one or multiple Host Services. Mega can already act as a Host Service, making it easier to facilitate P2P interactions.

Need for Relay Nodes in Both Nostr and DHT

Whether using Nostr or DHT for networking, there's a requirement for Relay nodes to facilitate data transfer. Mega's architecture allows it to act as a Client and a Relay node, streamlining the data transfer process.

Database-Driven Storage Advantages

Mega uses a database for storage, which offers significant advantages when storing and retrieving large amounts of DHT information. This makes Mega highly efficient and scalable, especially for P2P networks that require quick data lookups.

Reusability of Existing Codebase

Mega has rewritten the underlying storage logic of Git and has implemented HTTP/SSH for data transfer and Pack file parsing. This means that a significant portion of the code can be reused when implementing P2P functionality, reducing development time and effort.

Support for LFS Protocol

Mega has also implemented the Large File Storage (LFS) protocol, which is particularly beneficial for transferring binary files. This gives Mega an edge in handling repositories that contain large binary files, making the P2P transfer process more efficient.

In summary, Mega's existing capabilities, from its flexible architecture and Host Service functionalities to its efficient storage and data transfer protocols, make it an excellent candidate for implementing Git's Peer-To-Peer functionality. Its design considerations align well with the requirements and challenges of creating a robust, efficient, and scalable P2P Git network.

References

  1. GitHub - web3infra-foundation/mega: Monorepo and P2P Git Engine for Enterprise and Individual
  2. Pro Git - Git Internals - Transfer Protocols