Show HN: Peer-to-Peer data transfer tool based on libp2p

marcodiego · on Feb 13, 2021

I miss the days when it was easy to find content on Napster and later Kazaa. Today, Solseek seems to be the only p2p network that still preserves that feeling while gnutella fells more like a graveyard and torrent needs external trackers.

Is there any decentralized p2p network where it is easy to find content nowadays?

capableweb · on Feb 13, 2021

Agree with you. I fondly remember the days of browsing and trying to get invited to DC++ hubs and trying to add more local data in order to see more. Gone are those days.

We're so close as well! P2P is stronger than ever, especially torrents who have very mature implementations now. We're just missing being able to distribute the metadata in order to get the torrents (or magnet links) to be P2P as well, something like TPB on a P2P network would make things unstoppable.

rakoo · on Feb 13, 2021

Having a distributed search engine is far from easy (there's the main issue of trust in the index). It's possible to do it the other way around, thanks to DHT scraping being allowed: everyone can scrape the peers and build an index at home, with tools like magnetico (https://github.com/boramalper/magnetico) for example

the8472 · on Feb 14, 2021

Afaict magnetico is a selfish implementation. It takes resources from the network but doesn't provide any back. If everyone ran that at home instead of a proper DHT node then the network wouldn't work. That said, it used to be far worse (actively polluting others' routing tables) and it's nice seeing BEP51 validated as a proof of concept.

rakoo · on Feb 14, 2021

Not the author of magnetico here.

I think magnetico is not expected to replace a proper DHT node but be run in parallel: one is naturally present because you're a peer in any amount of torrents, the other is using the network for searching. I'd expect both to be running on the same machine.

After a few weeks of running it I know have a 2.6GiB database of torrents. I wouldn't expect a standard torrent node to include this feature, especially because the value of magnetico only comes when the database is big enough.

That said magnetico should definitely be a standard node in the network, it's light enough that it can do it

capableweb · on Feb 14, 2021

> That said magnetico should definitely be a standard node in the network, it's light enough that it can do it

Maybe in the future. As things are right now, magnetico is a huge bandwidth eater and will literally eat all available bandwidth if it can, doesn't seem to have any limits setup today so if you want to be able to do other things at the same time, you have to limit the process/IO manually.

marcodiego · on Feb 13, 2021

Why can't simply .torrent files be distributed using gnutella2? Both services could then be integrated and we would automatically get decentralized distributed search and sharing.

yjftsjthsd-h · on Feb 13, 2021

Don't you need a tracker still?

heavyset_go · on Feb 13, 2021

Not with DHT.

mawise · on Feb 14, 2021

To some degree, the paid services (itunes) won out by making it so easy to use. There wasn't a viable paid alternative in the hayday of p2p file sharing.

Do you know if anyone has explored the viability of smaller-scale file sharing? As in I make my content available directly to my friends, and they can search over my and their friends content? Privacy leads to better security as there is potentially no central server or network to compromise...

superkuh · on Feb 14, 2021

Everyone that used to use computers mostly still use desktops in addition to smart phones.

But the vast influx of people that use smart phones as their main system means most don't have full ISP and real connection to the internet. At best they might have an ipv6 address. But an ipv6 and CNAT ipv4 still doesn't allow you to communicate with most peers on the internet even if it does work for most corporate websites.

If ipv6 ever arives then p2p can work. But until that time most mobile devices cannot participate in most of the internet, p2p included. And mobile is where the money is spent.

mintplant · on Feb 13, 2021

> When the peer is found, its public key is checked against the three remaining words (as the words were derived from that key), and a password authenticated key exchange happens to authenticate each other.

What's the PAKE used for, if the public key is already verified? What's the "password" here?

dennis-tra · on Feb 14, 2021

The Public-Key verification was intended as an additional sanity check and I considered it not secure enough to just rely on it alone - so I added the PAKE step as well.

As lotharrr mentioned other flaws, you may want to check out his suggestion which seems like the way to go for me. This to eliminates this "sanity check" step all together.

Ericson2314 · on Feb 13, 2021

Always excited to see libp2p used by more things. Historically I feel like the efforts for p2p things were pretty balkanized. Experimentation is good, but not at the cost of network effects and the eventual need to pressure network operators.

The more different things use libp2p, the more flexible it will become. We can get meaningful experimentation and interoperability (different concepts won't interoperate of course, but the same concept will). This is nice!

mapgrep · on Feb 13, 2021

Looks cool. Is it safe to assume it’s just a typo/accident that the codes don’t seem to match here?:

  The sending peer runs:
  
  $ pcp send my_file
  Code is:  bubble-enemy-result-increase
  On the other machine run:
  pcp receive bubble-enemy-result-increase
  
  The receiving peer runs:
  
  $ pcp receive december-iron-old-increase
  Looking for peer december-iron-old-increase...

dennis-tra · on Feb 13, 2021

Thanks @mapgrep, I fixed the typo based on your comment, but couldn’t reply timely due to the noprocrast setting of HN.

lifty · on Feb 13, 2021

Indeed it was a typo. The README has been updated.

lotharrr · on Feb 13, 2021

(I'm the author of magic-wormhole)

Nice! I'm glad to see libp2p getting more use. I would love to have it as a connection protocol in magic-wormhole someday. Your implementation looks really slick.

I'll echo mintplant's question.. I'm interested in more details of the protocol. From the README (which is really good, thank you), I gather that it creates a libp2p keypair, uses the first 44 bits of the pubkey as the transfer code, use the first 8 bits plus a rounded timestamp as a DHT channel ID, then.. publishes its libp2p contact information to a DHT slot under the channel ID and waits for the peer to connect? I guess I need to learn more about libp2p and the difference between publishing information to a key, and accepting connections. Like mintplant, my big question is what gets used as the PAKE secret.

The PAKE step in magic-wormhole is there to make sure there's some secret value that is known only to the two correspondents, not anything in the middle, and use that to negotiate a full session key. It's the only piece of knowledge that distinguishes your intended recipient from an attacker.

If you're using the 44 bit transfer code as the PAKE "secret", and that's trivially derivable from the pubkey, and the pubkey is observable to anyone watching the network (or the DHT entries), then it's not actually a secret. An attacker could monitor the DHT for new keys in the "/pcp/" namespace, read their contents, connect to the indicated sender, note the public key they used for the connection, rebuild the transfer code that you created, run the protocol in the same way your intended recipient would, and steal the file. The sender would see the program complete earlier than they expected (before they even told anyone the transfer code), and the recipient would see some sort of error.

A particularly clever attacker would read the sender's data from the DHT, create a keypair with the same first 44 bits (requires 2^44 keypairs, but that's not comfortably impossible, and all of the work could be done ahead of time), flood the DHT with their own details (so the recipient connects to the attacker's host instead of yours), and then man-in-the-middle the connection, allowing them to both steal your file and substitute an alternate one to your recipient, with minimal evidence left behind.

But there's an easy fix: have it create four random words, use the first one or two as the channel ID (perhaps combined with the quantized timestamp), and the remainder as the secret PAKE input, which would completely solve that problem. The channel ID could be derived from a randomly-generated libp2p pubkey, or it could just be random. The important thing is that the "password" input to the PAKE step is uniformly random and unrelated to anything outside of the transfer code, and that it never gets revealed to anyone but the recipient (so it can't be used for any network purposes).

Cool stuff.. thank you for building it!

dennis-tra · on Feb 14, 2021

Thanks for your feedback and the kind words as I have great respect for magic-wormhole as well! I watched your talk at PyCon 2016 several times as I was building pcp.

> [...] it creates a libp2p keypair, uses the first 44 bits of the pubkey as the transfer code, use the first 8 bits plus a rounded timestamp as a DHT channel ID, then.. publishes its libp2p contact information to a DHT slot under the channel ID and waits for the peer to connect?

That summarizes it really well. There are actually three types of records you can store in the DHT of IPFS: Provider, IPNS and Peer records [0]. What pcp stores in the DHT are:

- cid("/pcp/{unixts}/chanID") -> peerID (provider record) - peerID -> multiaddresses (peer record)

where cid stands for the content ID [1] of the given string. So the receiving peer first searches for cid("/pcp/{unixts}/chanID"), finds the peerID and then searches for its associated addresses. These addresses can contain the public address of your home router (if NAT traversal is possible) or relay addresses.

All of this complexity is pretty much hidden away by libp2p, which is amazing!

Regarding your points about using PAKE/Public Key etc: I thought of making it even harder by tying the random code to the identity - but this is indeed flawed. I'll change the logic to your suggestion!

Honestly, thanks for your write-up and the valuable input!

[0] https://docs.ipfs.io/concepts/dht/ [1] https://docs.ipfs.io/concepts/content-addressing/

sodality2 · on Feb 14, 2021

I spent many hours reading the magic-wormhole docs to study for my p2p IB project, thought I'd say thanks. :-)

_nalply · on Feb 14, 2021

One problem of p2p is bootstrapping.

When a p2p application starts the first time it needs to know how to contact the first peer. After that the first peer can give addresses of more peers. But how can the p2p application get the address of the first peer?

A solution would be a central server that maintains a list of peers. But this is not p2p. After all the point of p2p is that it doesn't need central servers.

https://github.com/dennis-tra/pcp#how-does-it-work uses IPFS to contact the first peer. OK.

However in the quest to understand how to boostrap p2p really independent of a central service, I feel somewhat cheated, because IPFS also would need to bootstrap. It's sort of kicking the can down the road. How does IPFS bootstrap?

Is there really a way to bootstrap p2p without reliance on a list of central servers?

I thought about a hardcoded algorithmic series of addresses. Let's say by using a repeatable random generator. An example: the first address is jec6r5bz.io, then 0ueasze6.ch, etc., I made up the addresses. The project would register the first few addresses and if they are blocked, register more addresses on that series. This way the application can bootstrap independent of a specific central server because it can try the next address if it doesn't work.

nooneiswatching · on Feb 14, 2021

The problem is, as you've said, very generally and applies to almost all p2p applications. The Bitcoin client has a series of bootstrapping attempts (https://en.bitcoin.it/wiki/Network#Bootstrapping) which involves a list of hardcoded DNS and IP addresses, and establishing an IRC connection to find peers. I think using IPFS for bootstrapping is a pretty good idea IMO. At least you're relying on an already-decentralized system and not on a central server.

I've been thinking about finding simple solutions to the problem like treating the IP space like a binary search space or generally finding solutions that are better than random sampling (More methods are compared in https://link.springer.com/content/pdf/10.1007%2F978-3-642-01...)

I anyone can recommend more resources for learning about bootstrapping methods or has examples of how other apps solve the problem, please let me/us know!

the8472 · on Feb 14, 2021

> Is there really a way to bootstrap p2p without reliance on a list of central servers?

Yes. A list of nodes. The key part is that for p2p bootstrap you don't need to learn about any special node in the network, just any node will do. A set of initial, long-lived contacts could be embedded in a link when sharing content , it can be loaded from a file, it can be obtained via configurable DNS lookups. If your network were so niche that you couldn't even afford a server and all you have is an anonymous internet forum then you could even post a pastebin with some IPs and ports every now and then and prompt users to paste that.

And once you have bootstrapped yourself you can keep a local cache of long-lived nodes which you can also provide to others.

It seems philosophically hard, but practically speaking you only need to get in once and there are so many ways to do that. A well-known, server-like points of contact are mostly used because they are the most convenient and reliable option, but they're not really the only option.

_nalply · on Feb 14, 2021

So for automatic bootstrap use a hard-coded or generated list of long-lived nodes or a list downloaded from different p2p systems. The hard-coded list could be updated with a new version. Additionally allow manual input of addresses.

the8472 · on Feb 14, 2021

Or you defer bootstrap until you have some channel on which you can piggyback.

For example if you download some file then you need to exchange some content-identifying token anyway, a magnet link in bittorrent. That link can also contain a bunch of nodes the source peer knows, preferably the longest-lived ones it knows. Depending on where it's used you could also turn this into a QR-code, NFC peer or whatever.

Practically speaking an implementation will already contain a bootstrap list or might download it from somewhere else on first start. But that's just for convenience.

diggan · on Feb 14, 2021

Bootstrapping P2P networks in a decentralized fashion is indeed a hard problem.

IPFS bootstraps by either having a static list of bootstrap peers (essentially normal nodes in the network) that are run by trusted organizations (currently some of them are run by Protocol Labs who also develop IPFS, libp2p and some other software), which is similar to how many P2P networks does bootstrapping. Or, if there is any nodes on the local network, those can also be discovered via mDNS that allows discovery on local networks. Once you've found one peer, you can exchange "address books" and find more peers.

Other ways, that I don't think are implemented anywhere in the wild as far as I know, is to do some sort of random-walk of the IP address space and just try to connect to random hosts, see if they are nodes. Obviously not super cheap, especially compared to bootstrap lists, and introduces some issues if the protocol is normally run on random ports and/or if NAT is being a particular bitch for the protocol to work around, which it tends to be. Coupled together with a local cache of previously seen nodes, it'll make initial bootstrap a bit slow but very resilient, and future disconnect/reconnects faster, but at least it'll be fully decentralized.

One could also imagine with the proliferation of new P2P technology like blockchains and eventually Matrix moving to P2P, that you might be able to use those for bootstrapping as well. If someone runs a Ethereum node with some Ether, maybe providing bootstrapping functionality for other networks could be something that could work via smart contracts? Just spit-balling here, but more out-of-band channels could be added for the discovery of peers.

Edit: a free-time curiosity of mine has always been to use inaudible sound for humans as a discovery protocol as well, but haven't really looked into that too much. If the nodes could use the microphone to pickup bytes from broadcasting nodes via audio, one might be able to put speakers in cities that aren't really noticeable for humans (but maybe too much for animals/other tech?) that allows nodes to bootstrap via it.

exdsq · on Feb 14, 2021

I work on a decentralised blockchain — the way we bootstrap new nodes onto the network is through a ‘topology’ file which contains peers they want to connect to on launch. This is literally just IP addresses and ports. The file could then be updated by those nodes sharing their topology files on new connections, but we don’t do that.

diggan · on Feb 14, 2021

Sounds like you should just use libp2p (which is what IPFS is using for P2P networking) in whatever flavor you're building your blockchain with! At least you still have the same bootstrapping technology you're doing now, but with the added benefit of peers exchanging existing connections when new peers connects :)

exdsq · on Feb 14, 2021

I’ll check it out! Fwiw I work on Cardano but higher up the stack than networking, so I’m not 100% sure what happens there except for what I see as a ‘user’ of it.

wngr · on Feb 13, 2021

Awesome project, and a great README! How does it handle NATs? And, if I understand correctly, the receiver needs to enter `pcp receive` within 5 mins from the sender initiating the send? Seems a bit unergonomic. Anyway, why do you think that kind of "sharding" is necessary?

dennis-tra · on Feb 13, 2021

Thanks @wngr!

I really appreciate that this project gets a little attention.

Libp2p provides ways to establish a connection through NATs. For more information I recommend looking up the Identify [0] and Circuit Relay [1] protocols, as well as the NAT Traversal docs [2].

I implemented time based sharding because the DHT entries will remain there for up to 24h. So there is a significant chance for a channel ID collision that would result in connection attempts to peers that are long gone.

This is indeed a bit unergonomic and I haven’t covered edge cases that come with that approach. Open for suggestions :)

Best

[0] https://docs.libp2p.io/concepts/protocols/#identify [1] https://docs.libp2p.io/concepts/protocols/#circuit-relay [2] https://docs.libp2p.io/concepts/nat/

kevincox · on Feb 14, 2021

> within 5 mins from the sender initiating the send?

Based on a strict reading of the readme you actually need to receive in the same 5min window. This means that you have at most 5min. If you send at the end of the window you can have less than a second.

A simple solution is to have the sender always round up (to the end of the current 5min window) and have the receiver search for windows rounded up and down. (Or have the sender broadcast two and the receiver look for one)

If you want more time the sender can keep broadcasting a new window every 5min until the receiver connects.

dennis-tra · on Feb 14, 2021

That's indeed correct and was exactly what I meant with

> I haven’t covered edge cases

So, thanks for your suggested solution! I think that's how I would implement it as well.

The receiver looks for two windows and the sender broadcasts new entries when 5min are over.

rgbrgb · on Feb 13, 2021

Nice README and tech! I like these projects in theory and I get the technical motivation, especially in reference to similar p2p tools that rely on a centralized peering server. But I'm having trouble understanding what people use these tools for in practice. I'm betting there are good use-cases, but I haven't seen any yet that wouldn't be better served by s3, airdrop, or BitTorrent.

Anyone here using pcp, croc, magic-wormhole, etc.?

What are you using it for?

gioscarab · on Feb 14, 2021

Should use PJON as a transport: https://github.com/gioblu/PJON

thepete2 · on Feb 14, 2021

Juan Bennet mentioned in one of his talks how crazy it is that there still isn't a simple way to send files from one pc to another. This is a nice tool.

orthecreedence · on Feb 14, 2021

How are you liking libp2p? I'm about to start a project in it, and it's my first foray into p2p. Anything I should know?

Sick project, btw.

dennis-tra · on Feb 14, 2021

I really enjoyed learning about all the concepts [0] and it's amazing what libp2p is doing for you behind the scenes - I'm thinking especially of NAT handling and relaying.

I think the APIs to have simple, sequential request/response communication with a peer could be easier. How to close communication and be sure data was received by the peer was also a tricky one. Just because you have written the data to the libp2p stream doesn't mean it was transferred to your peer. So I was continuously closing the stream too early, which resulted in data loss. Took me a little while to figure this one out.

I can recommend having a look at the examples [1] if you're planning to build your project in Go. They helped me a lot to understand how the APIs are supposed to be used. At a few points there outdated though - especially how to close streams (I just saw the examples were updated 2 days ago, so this is probably an outdated statement).

[0] https://docs.libp2p.io/concepts/ [1] https://github.com/libp2p/go-libp2p-examples

ComodoHacker · on Feb 14, 2021

Why not make use of some existing and alive p2p network's DHT, like BitTorrent? It would solve peer discovery time issue.

woleium · on Feb 13, 2021

so.. like magic wormhole then?

anaganisk · on Feb 14, 2021

Yes, but google drive didn't stop dropbox, one drive, box and hundred other services. Isn't competition in the community healthy?

petre · on Feb 14, 2021

Yup, only it doesn't need a python interpreter.

_prometheus · on Feb 13, 2021

awesome! :)