Download Satisfactory V0.5.2.1 OnLine
Download >> https://urlin.us/2tCYpQ
Abstract:Existing peer-to-peer systems rely on overlay network protocols forobject storage and retrieval and message routing. These overlayprotocols can be broadly classified as structured and unstructured -structured overlays impose constraints on the network topology forefficient object discovery, while unstructured overlays organize nodesin a random graph topology that is arguably more resilient to peerpopulation transiency. There is an ongoing discussion on the pros andcons of both approaches. This paper contributes to the discussion amultiple-site, measurement-based study of two operational andwidely-deployed file-sharing systems. The two protocols are evaluatedin terms of resilience, message overhead, and query performance. Wevalidate our findings and further extend our conclusions throughdetailed analysis and simulation experiments. 1 IntroductionPeer-to-peer Internet applications for data sharing have gained inpopularity over the last few years to become one of today's mainsources of Internettraffic [31,12]. Their peer-to-peerapproach has been proposed as the underlying model for a wide varietyof applications, from storage systems and cooperative contentdistribution to Web caching and communicationinfrastructures. Existing peer-to-peer systems rely on overlay networkprotocols for object storage/retrieval and message routing. Theseoverlay protocols can be classified broadly as either structured orunstructured based on the constraints imposed on how peers areorganized and where stored objects are kept. The research communitycontinues to debate the pros and cons of these alternativeapproaches [5]. This paper contributes tothis discussion the first multi-site, measurement based study of twooperational and widely deployed P2P file-sharing systems.Most P2P systems in use today [8,13] adoptfully distributed and largely unstructured overlays. In such unstructured systems there are few constraints on the overlayconstruction and data placement: peers set up overlay connections to a(mostly) arbitrary set of other peers they know, and shared objectscan be placed at any node in the system. While the resulting randomoverlay structures and data distributions may provide high resilienceto the degrees of transiency (i.e., churn) found in peer populations,they limit clients to nearly ``blind'' searches, using either floodingor random walks to cover a large number of peers.Structured, or DHT (Distributed Hash Table)-basedprotocols [28,33,36,25],on the other hand, reduce the cost of searches by constraining boththe overlay structure and the placement of data - data objects andnodes are assigned unique identifiers or keys, and queries are routedbased on the searched object keys to the node responsible for keepingthe object (or a pointer to it). Although the resulting overlayprovides efficient support for exact-match queries (normally in), this may come at a hefty price in terms of churnresilience, and the systems' ability to exploit node heterogeneity andefficiently support complex queries.This paper reports on a detailed, measurement-based study of twooperational file-sharing systems - the unstructuredGnutella [8] network, and the structuredOvernet [23] network. In a closely related effort,Castro et al. [5] presents a simulation-based,detailed comparison of both approaches using traces of Gnutella nodesarrival and departures [30]. Our study complementstheir work, focusing on the characterization - not comparison- of two operational instances of these approaches in terms ofresilience, query and control message overhead, query performance, andload balancing.Some highlights of our measurement results include:Both systems are efficient in terms of control traffic (bandwidth) overhead under churn. In particular, Overnet peers havesurprisingly small demands on bandwidth.While both systems offer good performance for exact-match queries of popular objects, Overnet surprisingly yields almost twicethe success rate of Gnutella (97.4%/53.2%) when querying for a setof shared objects extracted from a Gnutella client.Both systems support fast keywordsearches. Flooding in Gnutella guarantees fast query replies,especially for highly popular keywords, while Overnet successfullyhandles keyword searches by leveraging its DHT structure.Overnet does an excellent job at balancing search load; even peersresponsible for the most popular keywords consume only 1.5x morebandwidth than that of the average peer.We validate our findings and further extend our conclusions(Sections 7 and 8) throughadditional measurements as well as detailed analysis and simulationexperiments. The measurement and characterization of the two large,operational P2P systems presented in this paper will shed light on theadvantages/disadvantages of each overlay approach and provide usefulinsights for the design and implementation of new overlay systems.After providing some background on unstructured and structured P2Pnetworks in general and on the Gnutella and Overnet systems inparticular, we describe our measurement goals and methodology inSection 3. Sections 4-6present and analyze our measurement results from bothsystems. Section 9 discusses related work. We concludein Section 10.2 BackgroundThis section gives a brief overview of general unstructured andstructured P2P networks and the deployed systems measured in our study- Gnutella and Overnet.2.1 The Gnutella ProtocolIn unstructured peer-to-peer systems, the overlay graph is highlyrandomized and difficult to characterize. There are no specificrequirements for the placement of data objects (or pointers to them),which are spread across arbitrary peers in the network. Given thisrandom placement of objects in the network, such systems use floodingor random walk to ensure a query covers a sufficiently large number ofpeers. Gnutella [8] is one of the most popularunstructured P2P file-sharing systems. Its overlay maintenancemessages include ping, pong and bye, where pings are used to discover hosts on the network, pongs arereplies to pings and contain information about the responding peer andother peers it knows about, and byes are optional messages thatinform of the upcoming closing of a connection. For query/search,early versions of Gnutella employ a simple flooding strategy,where a query is propagated to all neighbors within a certain numberof hops. This maximum number of hops, or time-to-live, isintended to limit query-related traffic.Two generations of the Gnutella protocols have been made public: the``flat'' Gnutella V0.4 [7], and the newerloosely-structured Gnutella V0.6 [14]. Gnutella V0.6attempts to improve query efficiency and reduce control trafficoverhead through a two-level hierarchy that distinguishes betweensuperpeers/ultrapeers and leaf-peers. In this version, the core ofthe network consists of high-capacity superpeers that connect to othersuperpeers and leaf-pears; the second layer is made of low-capacity(leaf-) peers that perform few, if any, overlay maintenance andquery-related tasks.2.2 The Overnet/Kademlia ProtocolStructured P2P systems, in contrast, introduce much tighter control onoverlay structuring, message routing, and object placement. Each peeris assigned a unique hash ID and typically maintains a routing tablecontaining entries, where is the total number of peersin the system. Certain requirements (or invariants) must be maintainedfor each routing table entry at each peer; for example, the locationof a data object (or its pointer) is a function of an object's hashvalue and a peer's ID. Such structure enables DHT-based systems tolocate an object within a logarithmic number of steps, using query messages. Overnet [23] is one of thefew widely-deployed DHT-based file-sharing systems. Because it is aclosed-source protocol, details about Overnet's implementation arescarce, and few third-party Overnet clients exist. Nevertheless, someof these clients, such as MLDonkey [21], and librarieslike KadC [11] provide opportunities for learning about theOvernet protocol.Overnet relies on Kademlia [20] as itsunderlying DHT protocol. Similar to other DHTs, Kademlia assigns a160-bit hash ID to each participating peer, and computes anequal-length hash key for each data object based on the SHA-1 hash ofthe content. key,value pairs are placed on peers with IDsclose to the key, where ``closeness'' is determined by the oftwo hash keys; i.e., given two hash identifiers, , and , theirdistance is defined by the bitwise exclusive or (XOR) (). In addition, each peer builds a routing table thatconsists of up to buckets, with the th bucket containing IDs of peers that share a -bit long prefix. In a 4-bitID space, for instance, peer 0011 stores pointers to peers whose IDsbegin with 1, 01, 000, and 0010 for its buckets , , and , respectively (Fig. 1). Compared to otherDHT routing tables, the placement of peer entries in Kademlia bucketsis quite flexible. For example, the bucket for peer 0011 cancontain any peers having an ID starting with 1.Figure 1:Routing table of peer 0011 in a 4-digit hash space.Kademlia supports efficient peer lookup for the closest peers fora given hash key. The procedure is performed in an iterative manner,where the peer initiating a lookup chooses the closest nodesto the target hash key from the appropriate buckets and sends them RPCs. Queried peers reply with peer IDs that are closerto the target key. This process is thus repeated, with the initiatorsending RPCs to nodes it has learned about from previousRPCs until it finds the closest peers. The XOR metric and therouting bucket's implementation guarantee a consistent, upper bound for the hash key lookup procedure inKademlia. The Kademlia protocol is discussed in detailin [20].Overnet builds a file-sharing P2P network with an overlay organizationand message routing protocol based on Kademlia. Overnet assigns eachpeer and object a 128-bit ID based on a MD4 hash. Object searchlargely follows the procedure described in the previousparagraph with some modif