This application claims priority to Application No. 201810802871.3 filed in China on Jul. 20, 2018, entire contents of which are incorporated by reference herein.
This invention belongs to the field of network technology. In particular, it defines a method, system and device of a Discretionary Publishing and Search Service (DSS) that allows any individual content provider to decide how and where its content can be published and discovered in a networked environment.
As more and more information become available on the Internet, finding or searching for relevant information and/or content has become an ever-increasing challenge. In this document, we use the word information and content interchangeably. The information may be a depiction of an industrial product, a description of a commercial service, or intellectual works made available over the Internet. Also, in this document, the term content provider refers generally to any individual or entity who desires to post, publish or otherwise place any information on the Internet in any form and manner for any reason or purpose.
Conventional search services on the Internet are host driven. They generally depend on centralized search engine hosts (e.g. GOOGLE), which use traditional web crawlers to index relevant contents on websites. Such approach is disclosed in the following:
Patil, Yugandhara; Patil, Sonal (2016). “Review of Web Crawlers with Specification and Working” (PDF) International Journal of Advanced Research Computer and Communication Engineering. 5 (1): 4.
Kobayashi, M. & Takeda, K. (2000). “Information retrieval on the web”. ACM Computing Surveys. ACM Press. 32 (2): 144-173. doi:10.1145/358923.358934.
Traditional crawler-based information discovery makes no interaction with content providers, and presents the following fundamental challenges:
Different search engine hosts generally work independently from each other. There is no mechanism to help form alliance among different search engine hosts. Instead, search engine hosts have to compete by size in a zero-sum game, and market monopoly is the ultimate outcome. Also, content providers can not actively choose which search engine hosts to publish their content, nor form a community of content publishing and search services among themselves.
The invention disclosed here describes a novel approach that will help overcome these limitations. The invention makes innovative use of a secure global identifier service, to provide accurate and up-to-date descriptions of the underlying content.
In this document, terms “secure global identifier”, “global identifier service” and “globally unique identifier service” are used interchangeably, all references shall include, without limitation, the registration, resolution, and administration of globally unique identifiers and their attributes. By “secure global identifier service”, we mean an identifier-content binding service that, given an identifier, resolves the identifier into description, called attributes, about the content identified by the identifier securely over public Internet. Also, the identifier is registered by the provider of the underlying content, so are the associated attributes that describes the identifier and the underlying content.
The registration and resolution of the identifier is done in a secure fashion where only the content provider has the authority and access to make changes or updates to identifier and its attributes. Attributes resolved from the identifier resolution can be authenticated and protected against any security attack (e.g. spoofing [SPOOFING] and/or man-in-the-middle attack [MAN-IN-THE-MIDDLE]). See, S. Schuckers, Spoofing and Anti-Spoofing Measures, Information Security Technical Report, Vol 7, No. 4 (2002) 56-62, http://php.iai.heigvd.ch/−lzo/biomed/refs/Spoofing%20and%20Anti-Spoofing%20Measures%20-%202002_Schuckers.pdf; and M. Conti, et al. A Survey of Man in The Middle Attacks, IEEE Communications Surveys & Tutorials (Volume: 18, Issue: 3, third quarter 2016).
The secure identifier service must possess the following features and/or characteristics:
The identifier service must define a common access control mechanism in its resolution and administration service, so that owners of the identifiers (i.e. the underlying content provider) may define different roles and accesses for the identifier resolution and administration upon any subset of the identifier attributes. The identifier service must provide a well-defined credential mechanism so that resolution results can be verified and trusted. The identifier service must support any native characters in any native language as defined in Unicode standards.
Furthermore, the identifier service should have a naming scheme that allows easy integration and support of existing naming practice. This identifier service must define an extendable data model that supports any data type or structure for identifier attributes. Owners of the identifier may define their own data type for their identifier attributes and have the data type registered with the identifier service. The Handle System [HANDLE] as developed by the Corporation for National Research Initiatives (CNRI), is one such global identifier service. See, Robert Kahn and Robert Wilensky, “A Frame work for Distributed Digital Object Services’, May 13, 1995. doi:cnridlib/tn95-01); Sam Sun and Lary Lannom, “Handle System Overview”. IETF RFC3650, https://www.ietforg/rfc/rfc3650.txt, November 2003; Sam Sun, Sean Reily and Larry Lannom, “Handle System Namespace and Service Definition”. IETF RFC3651, https://www.ietforg/rfc/rfc3651.txt November 2003; Sam Sun, Sean Reily and Larry Lannom, “Handle System Protocol (ver 2.1) Specification”, IETF RFC3652, https://www.ietforg/rfc/rfc3652.txt, November 2003; and, Corporation for National Research Initiatives (CNRI) http://www.cnri.reston.va.us.
The Handle System is a secure global identifier resolution and administration service with a distributed open service architecture. It provides build-in security mechanisms for service integrity, data confidentiality, and service non-repudiation, and allows discretionary management of its identifier and identifier attributes. The Handle System is a secure global identifier service that consists of a root service cluster, called Global Handle Registry (GHR), and many layers of local handle services (LHS) that can be hosted by any organization to serve its perspective user community.
The core of this invention, call it Discretionary Search Service (DSS), is a discretionary information publishing and search system, where content providers, instead of the search engine hosts, manage themselves how the information may be published and discovered.
The system defines a set of service components, and leverages features of a secure global identifier service as described above, to allow content providers to register, publish, and manage their information and information templates, and use such templates in the system to assist the publishing and discovery of their content. Such discretionary search service is a distributed information system, consisting of different kinds of service components described as follows.
A basic service component, which is a service unit hereinafter referred to as the Registry & Search Service (RSS) unit. The RSS is a service component that works directly with any content provider. A content provider may choose to host its own RSS and make his templates available for others to use. The RSS will provide a set of templates, each of which defines a specific structure that allows information to be published and later searched. Content provider may choose to use any of the templates to publish their searchable content as defined by the template, thus makes their content or service discoverable. The RSS may also allow content providers to define and register their own templates and use such templates to facilitate their content discovery. Such templates as defined hereinabove are referred to hereinafter as publish/search templates, or simply templates.
Different RSSs may join together to form an Integrated Search Service (ISS) unit. An ISS is a service component that provides an integrated search interface for the benefit of its member RSSs as well as their users and the general public. ISS will not make changes to the templates as defined in its member RSSs, but instead provides an integrated search interface to help guide any search requests to the appropriate member RSS and the relevant contents.
Different ISSs may further join together to form a higher level of ISS in order to serve a larger community. An ISS may also have a mixture of ISSs and RSSs as its member units as well. All ISSs and RSSs are registered with Search Service Registry, or SSR, a neutral registration service unit that registers every RSS and ISS, each with a global unique identifier. Such registration also maintains the relationships among ISSs and RSSs in terms of perspective identifier attributes.
Every content template and published content based on such template are also registered with a global unique identifier, along with attributes for its authentication and discovery. The SSR may serve as the starting point for identifier resolution and content discovery, for any registered ISSs, RSSs, and their content. There can be as many RSSs and as many (layers) of ISSs as needed. Each ISS or RSS must be registered with SSR.
Content provider may choose to host a RSS itself, and may later choose which ISSs to join with. A content provider may also choose to use an existing RSS and use the templates provided by the RSS to publish its content and manage its discovery. With an existing RSS, a content provider may also create/manage its own set of templates to facilitate its content publishing and search.
The invented search system deploys an open service architecture where any individual or any organization may host its search (and publishing) service (e.g. RSS and/or ISS), and make it an integral part of the global search service. It is also a discretionary search service where search operations are conducted based on the templates defined or chosen by the content provider, instead of hidden algorithms and/or polices set by the search engine host. Content providers will have full management access of their templates and may make updates and adjustments any time to reflect changes of their underlying content.
The invention can also unite different individual search communities into a joint search service. It also allows different search communities to be formed at the discretion of content providers or their relevant RSSs and ISSs, thus provides better service for specific user communities. The system allows better security and access control protection and allows content providers to define what can be found and who can find them from the search service.
Service integrity and accountability measures can also be implemented to provide better trustworthiness from the search service. The invention also allows AI algorithms to be integrated at individual RSS and ISS units to help guide the search operation, without sacrificing security and integrity of the underlying templates. The invention defines an innovative set of methods and uses of secure global identifier service (e.g. the Handle System) to facilitate the registration and management of templates, as well as how the templates may be used in search operations.
Conventional search services generally depend on certain forms of Web Crawlers to collect information on the web. This is illustrated in
The SSR 203 provides registration services for every RSS and ISS service component units. The SSR 203 may also provide comprehensive search interface for the discovery of relevant ISS and/or RSS, and refer search-users to the appropriate search service components. The Discretionary Search Service as described in
Any publish/search template will be registered and identified with a globally unique identifier via the global identifier service. Such templates may be defined in terms of XML, JSON or YANG language. Publish/search templates are stored as identifier attributes. The administrator of the identifier is the one who created the template, and will have full management control of the template.
Content providers may browse through the collection of templates from a RSS, and choose the desired template to publish his information. Such published information may lead to a content repository or online service/application hosted by the content provider, or explain a service or a product that the content provider wants to make available. Content providers can freely choose any RSS to publish their searchable information. They can also select any templates provided by RSS that best serve their purpose.
Every template under RSS is given a global unique identifier, so is every owner of such templates. The owner or the administrator of the template may make changes or remove the template upon successful authentication with the RSS.
ISS 704 provides an integrated search interface based on its collection of registered RSSs and ISSs. Note that ISS 704 cannot make any changes to any of the registered templates. Templates are managed by RSSs and can only be modified and changed by the template owner/administrator.
Search interfaces provided by RSS, ISS and local search interface provided by content provider, subject to the discretion and choice of their owners/administrators, can contain the reference to the generic search interface provided by the SSR. Any search request from user may be directed to the SSR public search interface.
Furthermore, DSS applies the blockchain technology to implement data transmission and transaction among RSS, ISS, SSR, and the service community formed by above service components. Thus, the blockchain provides credibility for publishing information in the service community formed by above service components.
Number | Date | Country | Kind |
---|---|---|---|
201810802871.3 | Jul 2018 | CN | national |