IHEP and CC-IN2P3 connectivity issues

Introduction

In this space you will find updated information regarding the connectivity issues between CC-IN2P3 (Lyon, France) and IHEP.

Background Information

Being nodes of the LHC computing grid, both CC-IN2P3 and IHEP computing center provide computing services for the LHC experiments and regularly exchange large amounts of experimental data through the transcontinental network. As such, the availability, stability and capacity of the network link between those two sites are of primary importance. Since 2011, we noticed a very low network throughput between those sites with respect to the theoretical capacity of the link connecting the sites, so we started investigating the cause of such abnormal behavior and the possible ways to increase the network throughput.

Several organizations operate segments of the network link between the sites. In China, IHEP is connected to CSTNet (Chinese Science and Technology Network) which connects itself to CERNET (the Chinese Academic and Research Network). TEIN3/Orient, the transcontinental link between Europe (London) and China (Beijing) is operated by GEANT. CC-IN2P3 is linked to GEANT by RENATER (the French Academic and Research Network).

After summarizing the network monitoring tools used for this work, we report on the actions triggered aiming at solving the observed problems. These actions are performed under the umbrella of the France-China Particle Physics Laboratory (FCPPL).

Monitoring Tools

Timeline

December 2011

While transferring experimental data for the TREND experiment from IHEP to CC-IN2P3, we noticed a suspiciously low network throughput. This has triggered an investigation that demonstrated that it was not possible to exploit of the theoretical capacity of the link between the two sites. The observed throughput from IHEP to CC-IN2P3 was only 6.4 Mbps.

More details about this can be found in the documents attached to the agenda of the FCPPL computing project meeting held on December 12th, 2011.

January 2012

Early January, we got permission from CERN and 4 European WLCG tier-1 sites to regularly monitor network throughput from and to IHEP using Perfsonar. This is useful to understand whether the low throughput observed is exclusively related to CC-IN2P3 or is also observed with other European sites using the same transcontinental link.

Several very detailed tests were jointly performed by IHEP and CSTNET network experts. A misconfiguration was found and corrected on January 21st 2012. The network thoughput from IHEP to CC-IN2P3 jumped from 6 Mbps to 600 Mbps, as measured by Perfsonar. This improved throughput is considered good enough, so we focus our efforts on understanding why the throughput in the direction CC-IN2P3 to IHEP remains extremely low: 25 Mbps.

More details on this can be found in the documents attached to the agenda of the FCPPL computing project meeting on February 10th, 2012.

March 2012

We decided to make thoughput tests on each one of the network segments in the path CC-IN2P3 to IHEP. The tests were performed using 5 machines in the IN2P3 network, each using a different network configuration. Initial results of those test showed significant disparities in the observed throughput which was supposed to be related to the network configuration specific to those machines. In particular, the machines with 2 network interface cards (each 1 Gbps) reached less throughput than the ones using just one network card. The bonding algorithm is suspected to be the cause of the low throughput, so we decided to investigate this.

In addition, it was shown that the throughput from the GEANT routers in London to IHEP was good (799 Mbps). This test was performed on January by an expert of the GEANT network operations centre. This lets us think that the limitation may be located in the segment CC-IN2P3 to GEANT in London.

More details on this can be found in the documents attached to the agenda of the FCPPL computing project meeting on March 12th, 2012.

April 2012

Yvan Calas from CC-IN2P3 performed several tests to confirm the hypothesis of the bonding algorithm issue. Below is a brief summary of the test campaign and its conclusions.

What we know so far

  • The bandwidth of the network link between CC-IN2P3 and IHEP is about 1 Gbps. The transcontinental link TEIN3/Orient is shared by several organizations and has a bandwidth of 2.5 Gpbs.
  • The measured throughput from IHEP to CC-IN2P3 is roughly 600 Mbps and is reasonably stable (measured by Perfsonar)
  • The measured throughput from CC-IN2P3 to IHEP is roughly 20 Mbps (measured by Perfsonar)
  • Throughput tests from GEANT routers in London to Beijing show that the link can deliver 800 Mbps.

Performed Tests

  • Disk-to-disk tests using gridFTP: data travel from IHEP to CC-IN2P3
  • Disk-to-disk tests using gridFTP: data travel from CC-IN2P3 (several machines) to IHEP
  • Memory-to-memory tests from CC-IN2P3 to IHEP (using iperf)

Tests results

  • Memory-to-memory tests — CC-IN2P3 to IHEP: low throughput, results extremely variable with time: 10 Mbps to 409 Mbps using a Linux machine, 19 Mbps to 187 Mbps using a Solaris machine
  • Disk-to-disk tranfers — CC-IN2P3 to IHEP:
    • very low throughput: 64 Mbps to 560 Mbps (using 20 streams) depending on the size of the file transferred (119 MB to 4 GB). The sending machine runs Linux and used only one 1Gbps network card for this test
  • Disk-to-disk tranfers — IHEP to CC-IN2P3:
    • good results, consistent with the memory-to-memory tests regularly performed by Perfsonar.
    • not detailed here for keeping this document simple and keep focused on the real problem

The complete results of the memory-to-memory tests are in this file. The results for the disk-to-disk transfer tests are here and here.

May 2012

On May 22nd, CC-IN2P3 network experts modified the bonding algorithm of all the machines transferring data for their site. Before this date, the round-robin algorithm was used. Since then, the IEEE 802.3ad dynamic link aggregation algorithm is being used.

Initial memory-to-memory tests (using iperf) from CC-IN2P3 to IHEP using Linux machines at both ends showed an average thoughput of 220 Mbps, up from 16 Mbps before the modification. The sending machine has 2 network cards, 1Gbps each. However, both CC-IN2P3's Perfsonar and IHEP's Perfsonar which only have one 1Gbps network card show that the thoughput remains very low, around 30 Mbps.

June 2012

On June 13th, Yvan Calas performed a new round of file transfer tests between CC-IN2P3 and IHEP, using a Linux machine with 2 x 1Gbps and IEEE 802.3ad dynamic link aggregation. Very poor results, compared to the same test using a single network card: from 32 Mbps to 40 Mbps (using 20 streams) depending on the size of the file transferred (119 MB to 4 GB). The detailed results of this test are here.

The connectivity issue between CC-IN2P3 and IHEP was discussed durin the ATLAS FR-cloud regional centers meeting, held in Paris from June 15th to 19th, 2012. Jérome Bernier sent a presentation summarizing the discussions related to network connectivity among French and foreign sites. This presentation is here. The figures in this presentation show that there is an increase in the file transfer rate from CC-IN2P3 to IHEP, from 1.5 MB/sec to 3 MB/sec. Although a significant improvement, the absolute rate is very low compared to the measured capacity of the link.

On June 28th, IHEP network experts made a fresh installation and reconfiguration of the Perfsonar machine. Since then, there are two machines: one is devoted to bandwidth tests (http://perfsonar.ihep.ac.cn) and another to latency tests (http://perfsonar2.ihep.ac.cn). In addition, data on regular throughput tests between IHEP and GRIF-LAL has been collected since mid-June. Those tests are very relevant because GRIF-LAL and CC-IN2P3 both share a significant fraction of the network path to IHEP. Those results can therefore shed some light on the source of the problem we are trying to identify. Preliminary figures show iperf-based transfer rates from GRIF-LAL to IHEP of 224 Mbps, to be compared to 27 Mpbs observed from CC-IN2P3 to IHEP during the same period.

The downside of this re-installation is that the historical data is not visible through the web interface. A backup copy of those data is kept in case these data is considered necessary and an easy way to re-inject them into the data database is found.

July 2012

This issue was discussed during the agenda of the FCPPL computing project meeting on July 2nd, 2012. Here is the summary:

  • A modification of the configuration of the network interface of the Perfsonar instance at CC-IN2P3 was implemented. Instead of using a 10Gbps network card, one 1Gbps network card is now used. Using this card, much better performance is observed from CC-IN2P3 to IHEP. It is suspected that the low performance observed so far is due to packets being dropped in one intermediate router as a congestion control mechanism implemented by TCP. The bandwidth of the transcontinental link is 2.5 Gbps so one machine transmitting data at 10Gbps cannot sustain this rate and the TCP congestion control makes it speed down the transmission. So, the overall throughput is rather poor.
  • Since this change, we have observed that the throughput from CC-IN2P3 to IHEP has substantially increased, from 20 Mbps to 200Mpbs on average. This would confirm that a machine at CC-IN2P3 could effectively use the available bandwidth for transmitting data. In other words, we should not see limitations in the capacity of the site to transmit data to IHEP.
  • However, previous tests performed by Yvan have shown that the transmission machines are not able to send data at high speed, so this issue needs further investigation.

On July 13th, CC-IN2P3 set up separate nodes for throughput and latency tests, as follows:

August 2012

On August 7th, IHEP's Perfsonar was configured to perform regular throughput and latency tests against SARA.

Since August 22nd, IHEP is collecting data on the usage on Orient, the network link between Beijing and London. This work-in-progress is available at the address http://nms.ihep.ac.cn/mrtg/CNGI-6IX-Bj-London.html

September 2012

On September 25th, as a result of some exchanges between Eric Lancon and the network experts of GridKa, some modification to the network configuration was implemented at that site. Since then, the thoughput between GridKa and IHEP has significantly improved, although it does not yet seem to be stable.

October 2012

We discussed progress during the virtual meeting on October 11th. Here is a summary of the topics discussed:

  • On July 10th, RENATER installed a second 10Gbps link to GEANT for international traffic. This second link is in Geneva and is configured to be used by sites in the south of France. CCIN2P3 continue to use the first link in Paris where there is now less traffic. There is a dedicated 10G lambda between CCIN2P3-Lyon and the RENATER router in Paris.
  • The results of the throughput tests between CC-IN2P3's and IHEP's Perfsonar instances show that there is no intrinsic limitation to use the available bandwidth for traffic from CC-IN2P3 to IHEP.
  • The data collected so far shows that the Orient link is more busy in the direction London to Beijing (795 Mbps on average) than the other way around (302 Mbps on average). This may explain why is more difficult to reach better transfer rates from European tier-1s to IHEP than in the reverse direction.
  • Investigations about real data transfer rates have been made for the period april to October 2012. They show that the transfer rate is of order of .2MB/sec for files with size less than 100MB and about 10 times higher for files with sizes bigger than 1 GB. So the transfer rate measured for one channel is highly dependent of the mixture of files that are transferred. This observation holds also for transfers between CC and Tokyo as well as for transfers IHEP to and from BNL. As BNL and CC are connected to IHEP via different networks, it seems that the slowness of the transfers is not intrinsic to the network itself but may be to the way the transfers are initiated.
    During this investigation, it is shown the WLCG transfer dashboard (http://dashb-wlcg-transfers.cern.ch/ui/) gives results that are not consistent with ATLAS (http://bourricot.cern.ch/dq2/ftsmon/multi_spacetoken_view/BEIJING-LCG2/IN2P3-CC/2012-09-27/2012-10-03/24/0/1/) and CC (internal) dashboard which are consistent together. It would be interesting to learn how the average throughput is computed in each dashboard although this does not explain the large discrepancy between the dashboard.

During this meeting we agreed on the following actions:

  • Perform a new campaign of memory-to-memory throughput tests (via iperf) from two machines connected to the same subnetwork that the storage machines in both sites. This is to explore potential bottlenecks in the local networks.
  • Perform a new campaign of disk-to-disk file transfer tests (via gridFTP) from two machines connected to the same subnetwork that the storage machines in both sites. This exercise is similar to the one performed by Yvan earlier this year. This test and the memory-to-memory one will be performed once IHEP has finished the planned modifications to the local network (replacing a 2Gbps link by a 10Gbps link to the site's firewall).
  • Provide Jerome the coordinates of Fazhi, as the network expert at IHEP who can lease with GEANT and other network organizations in China.
  • Make some plots using unitary file transfer throughput tests using data produced by the FTS instance at CC-IN2P3. The idea is to classify the data in two categories (file sizes <= 100MB, file sizes > 100MB) and show the distribution of the unitary file transfer rates for those two populations, over a representative period of time.
  • Contact European tier-1s to ask permission to increase the duration of the throughput tests with IHEP, currently capped at 20 seconds. This short duration is considered insufficient to reach cruise transfer speed over transcontinental links.
  • Understand what is the modification performed at GridKa that resulted in an improvement of the file transfer rates.
  • Provide more details on the way IHEP is connected to BNL (via Gloriad), in particular what bandwidth is available and what is the observed throughput and latency between the two sites.
On October 12th, Eric Lancon sent us a plot that shows the throughput increase between GridKa and IHEP since September 24th, when the GridKa experts modified the firewall configuration of the site, as measured by Perfsonar. In addition, Eric notes that the file transfer rate as measured by ATLAS tools is around the canonical value of 5MB/s after this modification. The plots are here.

January 2013

On January 2nd, the capacity of the ORIENTplus/TEIN3 link from London to Beijing was upgraded to 5Gbps (it was 2.5Gbps) of usable bandwidth for transferring data. Currently there are no limitations on the amount of bandwidth we can use for high-energy physics data traffic but some limitations could be applied in the future if this traffic impacts other communities' usage of this shared network link.

In addition, IHEP's local area network configuration was modified early January so that the firewall bandwidth is now 10Gbps.

-- FabioHernandez - 2012-06-12

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf 2012-04-12-TransferTestResults-Bidirectional-Annotated.pdf r1 manage 103.0 K 2012-06-14 - 07:08 FabioHernandez  
PDFpdf 2012-04-19-MemoryToMemoryTestResults-Annotated.pdf r1 manage 51.7 K 2012-06-14 - 07:27 FabioHernandez  
PDFpdf 2012-04-19-TransferTestResults-Unidirectional-Annotated.pdf r1 manage 101.9 K 2012-06-14 - 07:09 FabioHernandez  
PDFpdf 2012-06-12-TransferTestResults-Annotated.pdf r1 manage 99.4 K 2012-06-14 - 07:09 FabioHernandez  
PDFpdf ATLAS_FR_cloud_20120618.pdf r1 manage 4150.2 K 2012-07-02 - 03:11 FabioHernandez Summary of discussions during the ATLAS FR-Cloud Regional Centers meeting in Paris
PDFpdf IHEP-FZK.pdf r1 manage 169.0 K 2012-10-14 - 12:54 FabioHernandez Throughput test results before and after Sept. 24th 2012, following a firewall configuration modification at GridKa.
Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r13 - 2013-01-14 - FabioHernandez
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback