<?xml version="1.0" encoding="US-ASCII"?>
<!-- This is built from a template for a generic Internet Draft. Suggestions for
     improvement welcome - write to Brian Carpenter, brian.e.carpenter @ gmail.com 
     This can be converted using the Web service at http://xml.resource.org/ -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<!-- You want a table of contents -->
<!-- Use symbolic labels for references -->
<!-- This sorts the references -->
<!-- Change to "yes" if someone has disclosed IPR for the draft -->
<!-- This defines the specific filename and version number of your draft (and inserts the appropriate IETF boilerplate -->
<?rfc sortrefs="yes"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc topblock="yes"?>
<?rfc comments="no"?>
<rfc category="info" docName="draft-zcz-nmrg-digitaltwin-data-collection-00"
     ipr="trust200902">
  <front>
    <title abbrev="Network Working Group">Data Collection Requirements and
    Technologies for Digital Twin Network</title>

    <author fullname="Cheng Zhou" initials="C." surname="Zhou">
      <organization>China Mobile</organization>

      <address>
        <postal>
          <street/>

          <city>Beijing</city>

          <code>100053</code>

          <country>China</country>
        </postal>

        <email>zhouchengyjy@chinamobile.com</email>
      </address>
    </author>

    <author fullname="Danyang Chen" initials="D." surname="Chen">
      <organization>China Mobile</organization>

      <address>
        <postal>
          <street/>

          <city>Beijing</city>

          <code>100053</code>

          <country>China</country>
        </postal>

        <email>chendanyang@chinamobile.com</email>
      </address>
    </author>

    <author fullname="Pedro Martinez-Julia" initials="P." role="editor"
            surname="Martinez-Julia">
      <organization>NICT</organization>

      <address>
        <postal>
          <street>4-2-1, Nukui-Kitamachi, Koganei</street>

          <region>Tokyo</region>

          <code>184-8795</code>

          <country>Japan</country>
        </postal>

        <email>pedro@nict.go.jp</email>
      </address>
    </author>

    <date year="2022"/>

    <area>Networking</area>

    <workgroup>Internet Research Task Force</workgroup>

    <keyword>Digtial Twin; Digital Twin Network; Data Collection</keyword>

    <abstract>
      <t>The Digital Twin Network is a network system with Physical Network
      and Twin Network, which can be mapped interactively in real time. The
      construction of Digital Twin Network requires real-time data of Physical
      Network to update the state of Twin Network. This document aims to
      describe the data collection requirements and provide data collection
      methods or tools to build the data repository for digital twin
      network.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref>.</t>
    </note>
  </front>

  <middle>
    <section anchor="intro" title="Introduction">
      <t>With the deployment of Internet of Things (IoT), cloud computing and
      data center, etc., the scale of the current network is expanded
      gradually. However, the increase of network scale leads to also
      increasing the complexity of the current network, and it induces plenty
      of problems. In order to improve the autonomy ability of network and
      reduce potential negative effects on physical and virtual networks, we
      consider that an endogenous intelligent and autonomous network
      architecture which achieves self-optimization and decision is
      indispensable (in general, self-management and self-operation). The
      digital twin technology answers to the challenge of building
      self-management systems because it can optimize and validate policies
      through real-time and interactive mapping with physical entities.<xref
      target="I-D.irtf-nmrg-network-digital-twin-arch"/></t>

      <t>Data is the cornerstone required for constructing a digital twin for
      a network, namely a Digital Twin Network (DTN). In the face of large
      network scale, data collection, storage and management are faced with
      great challenges. So, data collection methods and tools should meet the
      requirements of target-driven, diversity, lightweight and efficiency,
      while being open and standardized. Among all the requirements, achieving
      a lightweight and efficient data collection method is of the most
      importance. If the full-data collection method is adopted, huge storage
      space and bandwidth resource is needed, especially for complex scenarios
      that require real-time data and traffic from multi-source and
      heterogeneous devices. Therefore, it is extremely important to agree on
      lightweight and efficient data collection, aggregation, and correlation
      methods, toward building the telemetry data transmission, processing,
      and storage required to build a DTN system.</t>
    </section>

    <section title="Definitions and Acroyms">
      <t>PN: Physical Network</t>

      <t>IMC: Instruction Management Center</t>

      <t>DSC: Data Storage Center</t>

      <t>DTN: Digital Twin Network</t>

      <t>TSE: Telemetry Streaming Element</t>

      <t>RDF: Resource&nbsp;Description Framework</t>

      <t>CPE: Complex Event Processing</t>
    </section>

    <section title="Data Collection Requirements for Digital Twin Network">
      <section title="Target Driven and On-demand Collection">
        <t>The monitoring data of a network is the basis to build a DTN
        system. Such data is collected from physical and virtual networks. It
        includes, but is not limited to, the following types:<list
            style="symbols">
            <t>Provisional and operational status of physical or virtual
            devices, as well as the network topology with all network
            elements.</t>

            <t>Running status of physical, logical, or virtual ports and
            links.</t>

            <t>Logs and events records of all the network elements.</t>

            <t>Statistics (packet loss, traffic throughput, latency, etc.) of
            flows and ports.</t>

            <t>Various data regarding users and services.</t>

            <t>Lift-cycle operation data of all network elements.</t>

            <t>All above data in time series.</t>
          </list></t>

        <t>The collection of network data for maintaining a DTN should be in
        target-driven and on-demand mode. It is not always necessary to
        collect complete network data list above because of the high cost of
        resources (CPU, memory, bandwidth etc.). The type, frequency and
        method of data collection aim to meet the application of a DTN depends
        on the specific network topology and application requirements.</t>
      </section>

      <section title="Diverse Tools for Various Data">
        <t>The different types of network data used to maintain a DTN have
        several characteristics. Some data (e.g. port statistics, key link
        info, etc.) requires higher collecting frequency, and some data (e.g.
        flow status, link fault, etc.) needs to be of higher level of
        real-time. Some data (e.g. device status, port statistics, etc.) can
        be collected directly and simply via normal tools, while some data
        (e.g. per-flow latency, traffic matrix, etc.) can only be acquired
        through complex network measurement. Therefore, multiple tools or
        methods are needed to collect the massive data required to build the
        DTN entity.</t>

        <t>Currently, some widely-used tools, such as SNMP, NetConf,
        Telemetry, INT (In-band Network Telemetry), DPI (Deep Packet
        Inspection), etc. can be candidate tools to collect data for digital
        twin network. Going forward, it is necessary to study new data
        collection technology in the following aspects in combination with the
        data requirements of network application for DTN:<list style="symbols">
            <t>High-performance data collection technology based on
            programmable circuits.</t>

            <t>Measurement methods for complex network data such as network
            performance and network traffic.</t>

            <t>Collaborative data collection technology for multiple data
            sources.</t>

            <t>Distributed and collaborative data collection technology for
            complex network, and the time synchronization problem of data
            acquisition.</t>
          </list></t>
      </section>

      <section title="Lightweight and Efficient Collection">
        <t>Data collection tools and methods should be as lightweight as
        possible, so as to reduce the occupation of network equipment
        resources and ensure that data collection does not affect the normal
        operation of the network. The major requirements are list as
        below.<list style="symbols">
            <t>Data collection tools and methods needs to improve efficiency
            of execution, reduce the cost of computing, storage and
            communication bandwidth.</t>

            <t>The collection of redundant data should be avoided or
            minimized.</t>

            <t>For the data set that needs to be collected, make full use of
            the data compression technology, to reduce the resource cost in
            the collection phase.</t>
          </list></t>
      </section>

      <section title="Open and Standardized Interfaces">
        <t>Data collection interface used to build the DTN should be open and
        standardized to help avoid either hardware or software vendor lock,
        and achieve inter-operability. The major requirements of data
        collection interfaces are:<list style="symbols">
            <t>Support configuration management, including the data collection
            protocol, frequency or period, etc.</t>

            <t>Support several speed options (e.g. minute-level, 10-second
            level, second level (near real time), and real time level) to
            accommodate different data requirements from applications.</t>

            <t>Be extensible so that more features can be added with limited
            parameter changes and with backward compatibility.</t>

            <t>Be able to provide secure and reliable information exchange
            mechanism.</t>
          </list></t>
      </section>

      <section title="Naming for Caching">
        <t>Both raw network data and knowledge items obtained from monitoring
        must be able to be addressed uniquely. This means to give a unique
        identifier or "name" to each data or knowledge item that references
        it. This name will be used by caching mechanisms to store the data and
        provide it for clients that request it, which will also use such
        name.</t>
      </section>

      <section title="Efficient Multi-Destination Delivery">
        <t>The maintenance of DTN systems will not be the sole purpose of
        monitoring information and knowledge communication. Other applications
        would also request raw telemetry data or knowledge items. They can use
        the name to identify it. The telemetry system, following the
        recommendations of <xref target="RFC9232">RFC 9232</xref>, will
        deliver the requested data or knowledge items to the requesters as
        much efficiently as possible. On the one hand, items will be provided
        by the closest cache to the destination of the data. On the other
        hand, items will be replicated in the best nodes, following an
        efficient multi-cast spanning tree. Different underlying protocols can
        be used to achieve this mechanism.</t>
      </section>
    </section>

    <section title="An Efficient Data Collection Method for Digital Twin Network">
      <section title="Overview">
        <t>The system that manages the DTN maps, in real time, the PN to the
        DTN. However the existing methods collect the full data from the PN
        for modeling, and do not consider problems like time-lag, insufficient
        storage resources, low computational efficiency and waste of bandwidth
        resources caused by data transmission. In order to solve these
        problems, this section introduces an efficient data collection method
        for maintaining the DTN. This data collection method is based on
        sending instructions to the elements of the PN for them to pre-process
        the data (data cleaning or knowledge representation) before sending it
        back to be applied to the DTN.</t>
      </section>

      <section title="Efficient Data Collection Mechanism">
        <t>The management system structure consists of the PN and the DTN. The
        PN includes multiple Data Storage Centers (DSC) and Telemetry
        Streaming Element (TSE), and the DTN includes the Instruction
        Management Center (IMC) and Data Storage Center (DSC). The TSE has
        multiple functions, including data collection, data aggregation, data
        correlation, knowledge representation and query, etc. In addition, a
        Complex Event Processing (CEP) engine is integrated into TSE to
        perform queries to the streamed data. The IMC has two functions. On
        the one hand, it is used to manage the registration of the DSC in the
        PN side, and its registration information can include various key
        information such as the IP address of the DSC in the PN side, chosen
        data type, and various index names in the data, data source name and
        data size, etc. On the other hand, it is used to adaptively configure
        data collection instructions according to the collection requirements
        of the DSC in the DTN side and search for IP addresses to send
        instructions. The instruction-carrying information includes rule-based
        mathematical expressions, executable models in .exe format, dynamic
        collection frequency, parameter lists, program text files in .m
        format, text files with parameter configuration, and other types of
        files. Instructions are flexible and programmable, and can be created,
        modified, combined, and deleted at any time according to requirements.
        When the DSC of the DTN side requests data to the IMC, the IMC
        searches the IP address of the DSC in the database with the
        registration information, which is built according to critical
        information, such as data type and data name, and functional
        instructions for data processing or knowledge representation can be
        implemented depending on the demand configuration. The DSC of the DTN
        side stores the effective information after data processing and
        knowledge representation returned by the TSE.</t>

        <t>The DSC in the PN side has two functions. On the one hand, it
        stores data of various types, such as performance indicators,
        operational status, log, traffic scheduling, business requirements,
        etc. On the other hand, it has the function of automatically parsing
        the instructions sent by the TSE. Then the operating environment of
        the instruction is configured according to the instruction needs, and
        data processing or knowledge representation is performed based on the
        instruction. Data processing mainly includes data cleaning, filling
        missing data, normalization, conflict verification, etc. Knowledge
        representation refers to the representation of the original data as a
        data structure that can be used for efficient computation. Such
        representation results are closer to machine language, which is
        conducive to the rapid and accurate construction of the model. The
        role of knowledge representation is to represent the original data as
        a data structure that can be used to efficiently calculate. Such
        representation results closer to the machine language, which is
        conducive to the rapid and accurate construction of the model.</t>

        <figure align="center" anchor="Fig_Data_Collection"
                title="Data Collection Process">
          <artwork align="center">+------------------------------+   +-----------------------+
|   Physical  Network          |   |  Digital Twin Network |
| +-----+    +-----+  +------+ |   |  +------+  +-------+  |
| |     |    |     |  |      | |   |  |      |  |       |  |
| | DSC |... | DSC |  | TSE  | |   |  |  IMC |  |  DSC  |  |
| |     |    |     |  |      | |   |  |      |  |       |  |
| +-+---+    +--+--+  +---+--+ |   |  +---+--+  +----+--+  |
|   |           |         |    |   |      |          |     |
+------------------------------+   +-----------------------+
    |           |         |               |          |
    | 1.1. Register       |               |          |
    +-----------+---------&gt;               |          |
    |           |         |               |          |
    |           | 1.2. Register           |          |
    |           +---------&gt;               |          |
    |           |         | 1.3. Register |          |
    |           |         +---------------&gt;          |
    |           |         |             2. Data req. |
    |           |         |               &lt;----------+
    |           |         | 3. Query and instruction |
    |           |         |    configuration         |
    |           |         |               +          |
    |           |         4. Send instructions       |
    |           |         &lt;---------------+          |
    |           |         |               |          |
    |           |   5. Parse and execute  |          |
    |           |      instruction        |          |
    | 6. Data subscript.  |               |          |
    &lt;---------------------+               |          |
    | 7. Knowledge        |               |          |
    |    representation   |               |          |
    |     8. Data pushing |               |          |
    +---------------------&gt;               |          |
    |           | 9. Data aggregation and |          |
    |           |    correlation          |          |
    |           |         | 10. Send processed data  |
    |           |         +--------------------------&gt;   
    |           |         |               |          |</artwork>
        </figure>
      </section>

      <section title="Data Collection Process">
        <t>The specific process is as follows:<list style="symbols">
            <t>The DSC in the PN side registers into the TSE. The TSE
            registers into the IMC. Both provide their IP addresses, the data
            type, the data source, the data size, etc.</t>

            <t>The DSC in the DTN side sends the data collection request to
            the IMC.</t>

            <t>According to the data collection request, the IMC intelligently
            queries the registration addressing information and configures the
            data processing instruction.</t>

            <t>The IMC in the DTN side sends the corresponding instruction
            according to the query result to the TSE.</t>

            <t>After receiving the instructions, the TSE parses them and
            executes them. The query function can be performed by the CEP
            engine, which receives all telemetry data and processes it with
            all queries provided.</t>

            <t>The TSE sends data subscription to DSC in the PN side.</t>

            <t>The DSC in the PN side represents the data semantically in RDF
            form or sends the data in raw form to the TSE for it to make the
            semantic representation.</t>

            <t>The DSC in the PN side pushes the data or knowledge item to the
            TSE.</t>

            <t>The TSE aggregates and correlates the collected data or
            knowledge items. Then, according to the actual needs, generates
            aggregated data or knowledge items.</t>

            <t>The TSE sends the resulting data or knowledge items to the DSC
            in the DTN side.</t>
          </list></t>
      </section>
    </section>

    <section title="Summary">
      <t>This draft describes the requirements for data collection and
      provides the data collection methods or tools required to build the data
      repository for maintaining DTN systems. These data collection methods or
      tools should meet the requirement of target-driven, diversity,
      lightweight and efficiency, while being open and standardized. Among all
      the requirements, lightweight and efficiency requirements are the most
      important. Thus, this draft provides a lightweight and efficient method
      for data collection that is particularly optimized for maintaining DTN
      systems. Going forward, more methods (transformation and aggregation
      functions) and tools (solutions) shall be studied to extend the contents
      of this draft.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>TBD.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document has no requests to IANA.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include="reference.RFC.9232"?>
    </references>

    <references title="Informative References">
      <?rfc include="reference.I-D.irtf-nmrg-network-digital-twin-arch"?>
    </references>
  </back>
</rfc>
