Generation of semi-synthetic textual content for search engine optimization.
Search machines use strong algorithms to gain higher relevance to human related information and filter out recognized synthetic contents. Widely used search indexing algorithms of modern crawlers implements a kind of calculations statistical distribution of occurrence of textual components (words, phrases, idioms) and compare it with values known from a natural language.
The recommended algorithm is based on a multidimensional matrix of normalized item relationship probability and allows computational predictions of occurrence individual content items in a natural language to achieve smooth item randomization without notable statistical distortion. The algorithm is learnable and requires a human teacher in initial learning phase fitting it with example texts to collect the statistic. By “item” is meant an arbitrary subpart of a text – a paragraph, a sentence, a word; the most complexity of the algorithm consists in optimal calculation of a statistical value on all nesting levels.
The work flow:
The process consists of two phases: teaching, and productive generation. The teaching phase is essential at the beginning to achieve a desired level of quality, and is optional later to gain further improvements.
The software provides a special administrative backend interface allowing to:
- enter an example text content, sized in range from a page down to individual words,
- provide categorizing keywords (tags), which will be used internally to track content relations,
- cause the system to generate an example content containing explanations about possible variability of individual text items,
- use a markup to extend the input text with information about special usage cases, but also to enter placeholders, for example individual or company names, addresses and so on, which should be handled differently than the rest of text (%NAME%, %COMPANY% ...)
- eventually repeat the process more times to achieve a desirable result quality,
- save configuration for the text, which will fix up (freeze) the relations in the matrix and allow to reuse gained “artificial knowledge” for next text examples.
For productive usage the software will provide an API allowing external system, for example a CMS module, to query and obtain a generated text by providing:
- a sufficient list of categorizing keywords (tags),
- an identification of requester environment to track already generated results and avoid repeats,
- a list of special values like above mentioned names and addresses, which will be entered in generated text in place of previously marked placeholders.
Additional, the administrative backend will also provide an area to manually enter required values and generate a text for productive use, for example intended to be used externally (email and so on) instead of be delivered by a CMS.