index

Chapter View
example

The primary function of the portal. Generates selected chapter from the corpus of annotated data, graphically aligned to individual pages of the facsimile. Pointing the cursor over individual words shows basic morphological information and sentence borders. First tokens in sentences and subordinate clauses also show translations. Clicking on any token of a sentence opens it in the Syntax Browser; this can be disabled by clicking on the Browser Lock (recommended for users with a touchscreen). Script options are original Cyrillic, scientific transcript into Latin (cf. below), and a simplified, diplomatic Latin transcript, devoid of accentuation.



Syntax Browser
example

Parses the clicked sentence in the given script according to clauses with translations. Individual tokens can be viewed for morphology or clicked on for more information in the Token Browser. The sentence can be further converted to the .conllu format used by various tools - the plain-text output is given in a separate tab. The sentence relations according to Universal Dependencies standard are also graphically represented by linear and tree views.



Token Browser
example

Clicking on an individual token in the Syntax Browser reveals information about its morphological shape, as well as its lemma (dictionary form) with English and literary Bulgarian equivalents, morphological and semantic classification. Optionally, references to other dictionaries (etymological, historic) may be included for further studies.



Plain View
example

Basically a simplified Chapter View for faster browsing. Shows the contains of a selected chapter of the annotated data or the whole corpus in a plain text format without the annotation, browsing functions or pictures of the facsimile. The text is aligned by sentences and given in the selected script (Cyrillic, scientific Latin, or diplomatic Latin).



Collated View
example

Allows side-by-side reading of selected individual chapters from the Sbornik and other sources - transcripts or earlier versions, similar to (presumed) Punčo's sources. The output is aligned according to contents and shows a fully annotated output as in the Chapter View. The other sources (and also other collations) can be accessed through the Editions portal.



Search Engine
link

React.js-based search tool for browsing the annotated data according to selected criteria. Possible queries can be based on full-text entries, lemmas, morphological (Multext-East) or syntactic (Universal Dependencies) annotation, or their combination (Advanced). Whole sentences are given as results, and they can be further browsed in Chapter View or downloaded in .csv or .conllu format. Per default, the queries use diplomatic Latin transcript, but also other scripts are available.

Please be aware, that the script loads the actual data at the first opening of the Search Engine - it will take about a minute - and stores it in the browser memory. To access more actual data, it is possible to manually reload them with a button.



Reference Grammar
link

Reference file containing grammatical information about the morphological forms attested in the Sbornik, damaskini, and in Church Slavonic literature, as well as about their coding in the corpus.



Preformatted Text
link

Opens a plain text file on the server containing the whole corpus without any .html tags or scripts, based on a manually corrected digital text generated with Transkribus from the facsimile. This is the fastest way of browsing the text, aligned only according to chapters and line borders. The text is given in a scientific Latin transcript and may show small differences in comparison to annotated data.



Character Set

Pop Punčo roughly follows a simplified variant of the Resava orthography with influences of Damaskini and Russian redaction of Church Slavonic. Single jer (ь) is preferred for both orthographic and phonetic (i.e. for /ă/) functions, although ъ is also used occasionally. Sometimes a suprasegmental paerčik is written both inside consonant clusters, at the end of the word, or even in a diphthong (twice in patria͛rxь). Variants of /i/ are numerous, with ï preferred in front of other vowels and diphthongs. Jat (ѣ, Lat. ě) and jery (ы, Lat. y) appear rarely in some Church Slavonic words (e.g. nyně 'now'). Only one jus (ѧ, Lat. ę) appears, frequently used for the diphthong /ja/ (e.g. na+ toę st̃ь 'in this world'). Punčo also uses characters џ for /dž/ or /č/ (e.g. xúџe 'at all', cf. BG xič), as well as s (Lat. ź), whose phonetic status (/z/ or /dz/) is unclear. Also commonly seen are "Greek" letters ѯ, ѱ, ѳ and ѵ, used with various degrees of regularity in loanwords, especially proper names. Consonants written suprasegmentally were written in line with the rest, unless the "covered s", written mostly as a dot with pokrytie, which is not always distinguishable from the titlo. Punčo futher used various ligatures (e.g. геP̫гиа > gewrgia here), which, having no UTF-8 counterparts, were transcribed with conventional characters.

Punčo uses mostly two accent markers: acute/oksia usually in non-final (i.e. tógo m.sg.gen/acc 'that') and grave/varia in final (e.g. svoè n.sg 'his') syllables. Accentuation is not always systematic: we can find both žéna 'woman' and ženà in the text. Circumflex/kamora appears rarely, its function seems to be rather ornamental (e.g. in mõlénïja '[of] prayers'). Breve is sometimes used in sequences of vowels on the last one (e.g. ne+ w̌stavi+ go 'do not leave him'), also in diphthongs (e.g. xr͒tiăni 'Christians'). Only spiritus lenis or psili seems to be used systematically in word- or syllable-initial vowels, sometimes hard to distinguish from diphthongs (e.g. diʾavole 'devils', along m.sg.dat diʾjávolu). It often appears together with an oksia, even if the word contains also another accent markers (e.g. ʾápríla). Spiritus asper or dasia also appears, but it is not clearly distinguishable from psili or breve (e.g. svóʿego m.sg.gen/acc 'his' here).

According to the orthographic tradition, clititcs and other monosyllabic words are usually written together with others. In the Latin transcript, such "orthographic words" are marked with the character + on the word, which is written together with the following one (e.g. неѡ̌ставиго > ne+ w̌stavi+ go). This affects also some morphems, handled as parts of words by modern BG or MK orthographies. Articles are handled as separate tokens as long as the word shows an old demonstrative root (e.g. srebró+ to 'the silver', but stlьpo 'the pillar'). Negative particle ne is handled as a separate token in front of verbs, adverbs and adjectives. Comparative and superlative particles are separated too (e.g. po+ mudri 'wiser'). Tokens written across lines or pages are marked by the character used by Punčo (e.g. xo-díxa 'they went') or _ (e.g. bra_tię 'brothers'). These helping characters are removed, along other interpunction, in the diplomatic transcription.

The following characters are used in the transcript and data files of the corpus. Original transcript was done in the Latin transcript with accented vowels and spirits. In the diplomatic transcript, these are removed, as well as the numerous variants of /i/ and /o/, interpunction and helping characters. Cyrillic numerals were always given with asterisks (e.g. *a* '1'), regardless to marking in the original.


cyr >	lat

а	a
б	b
в	v
г	g
д	d
е	e
ж	ž
s	ź
з	z
и	i
ı	ı	(> dipl. i)
ï	ï	(> dipl. i)
й	ĭ	(> dipl. i)
ӥ	ī	(> dipl. i)
к	k
л	l
м	m
н	n
ѡ	w	(> dipl. o)
ѿ	wt	(> dipl. ot)
ѡт	ōt	(> dipl. ot)
о	o
п	p
р	r
с	s
т	t
оу	ou	(> dipl. u)
ꙋ	u
у	u
ф	f
х	x
ѡ	w
ч	č
ц	c
ш	š
шт	śt	(> dipl. št)
щ	št
ъ	ъ
ы	y
ь	ь	(> dipl. ъ)	
ѣ	ě
ꙗ	ja
ѥ	je
ю	ju
ѧ	ę
ѯ	ѯ
ѱ	ѱ
ѳ	ѳ
ѵ	ѵ	(> dipl. i)
ѷ	ϋ	(> dipl. i)
џ	џ
͛	'

Accented and combined characters:

oksia		á é í ı´ ó w´ ú ý ь´ ъ´ ě´ ę´
varia		à è ì ò w` ù ь` ę` ě`
kamora		i͂ õ w͂ ũ y͂ ѵ͂
breve		ă ŏ ĭ w̌ 
paerčik		a͛ b͛ d͛ g͛ k͛ l͛ m͛ n͛ p͛ r͛ s͛ š͛ t͛ v͛ c͛ x͛ z͛ ž͛
pokrytie	a͒ e͒ i͒ o͒ u͒ w͒ ѵ͒  ï͒ -
		b͒ c͒ d͒ g͒ k͒ l͒ n͒ p͒ r͒ s͒ t͒ x͒ ž͒
spiritus	ʿ ʾ
titlo		b̃ c͂ č̃ g͂ l͂ ñ p͂ r͂ s͂ s͆ š̃ ś͂ t̃ v͂ x͂ ž͂

Special characters:

+		word boundary
_		word written across a line
◄		pointing hand (also ►)
♣		ornament, illustration
===		ornamental frame
†		cross
[]		sidenote
*a*		number


Abbreviations

ʾáp͒tle		apostole (also ʾáp͒le, ʾapl͒i, ʾáp͒tolska)
bc͂a		Bogorodica (also bdca)
bl̃glóvimь	blagoslovimь
bg̃ь		Bogь (also bž̃e, b̃e, bg̃a, b̃u, bg͂u, bž̃ïĭ, also bg̃ati)
cr̃kva		cьrkva
cr̃ь		car (no cr͒ki)
ča͒		čas
čt͒ь 		čest (also čt͒nago)
čt͒o		često
čl̃vekь		člověka (also čṽeka, čv͂eci, also č̃e)
dv͂cu		děvicu (also dv͂íci)
dx͂u		duxu
es͆		estь (also e͒tь, et͒ь, neg. net͒ь)
eѵ͒líe		evangelie
eѵ͒lístu		evangelistu
gd͒ь		Gospodь (also gp͒odíne, go͒podine, g͒ь, gd͒vi)
gl͂i		glagoli
gp͒dára		gospodára
ïi͒ssu		Isusu (ïis͂a, ïis͒a, ïi͒se, isu͒vo, ïĭs͒e, ïis͂su, is͒sa, ʾiïs̃ь, ʾis͂sь, ʾiıs͂ь, ʾiï͒su, is͒u...)
ierl͒imь		Ierusalimь (also er͒límь, ʾier͒lmь)
kr͒tь		krъstь (kr͒ь)
lis͆		listь
mc͒ь		mesecь (also mc͒o)
mi͒trь		monastir (also mn͒rь, mn͒trь, mn͒tirь, mon͒tiru)
ml͒tь		milostь
ml͂osrьdíe	milosrьdie
ml͂tva		molitva (also mt̃vu)
mč̃nici		mьčenici (also mč̃nikь, mñčka, mč̃telь)
mr͂ie		Marie
mt̃i		mati (also mt̃erь)
na͒		nas
nb̃o		nebo (but also na+nb͒i)
nečít͒o		nečisto
nš̃ь		našь
pre_st̃li 	presvetli
prdtču		predteču
pr͒čoju		prečistoju
pr͒no		prisno
proča͒		pročesti
pr͒roka		proroka (also pr͒róčestvo)
pr͒tolь		prestol
sa͒		sas (also sa͒ь)
sl͂ce		slьnce
sp͒i		spasi(tel) (also sp͒a)
st̃i		sveti
st̃ь		světь
sv͂šténïkь	sveštennikь (also sś͂e_nici, sś͂enomu, sś͂eniku)
tr͒cì		troici
tr͒čnuju		troičnuju
učñikь		učenik
va͒		vas
vosk͒ri		voskresi (voskr͒nïe)
x͒		xristieni
x͒u		Xristu (xt͒ósь, xr͒ta, x͒a, xt̃a, x͒s, x͂s)
źlò_mil͒éne	zlomislene