Commit graph

18 commits

Author SHA1 Message Date
Danko Aleksejevs
4935e6e1a3 fix: skip empty tokens in SearchOptions.Tokens() (#8261)
Query string tokenizer could return a list containing empty tokens when the query string was `\` or `"` (probably in other scenarios as well).

This seems undesirable and is what triggered #8260, but I'm posting this separately from that fix in case I'm wrong. Feel free to reject if so.

The actual change in behavior is that now searching for `\` or `"` behaves the same as if the query were empty (the bleve/elastic code checks that the tokenizer actually returned, anything rather than just query being non-empty).

### Tests

- I added test coverage for Go changes...
  - [x] in their respective `*_test.go` for unit tests.

### Documentation

- [ ] I created a pull request [to the documentation](https://codeberg.org/forgejo/docs) to explain to Forgejo users how to use this change.
- [x] I did not document these changes and I do not expect someone else to do it.

### Release notes

- [x] I do not want this change to show in the release notes.
- [ ] I want the title to show in the release notes with a link to this pull request.
- [ ] I want the content of the `release-notes/<pull request number>.md` to be be used for the release notes instead of the title.

Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/8261
Reviewed-by: Shiny Nematoda <snematoda@noreply.codeberg.org>
Co-authored-by: Danko Aleksejevs <danko@very.lv>
Co-committed-by: Danko Aleksejevs <danko@very.lv>
2025-07-03 11:29:04 +02:00
Danko Aleksejevs
184e068f37 feat: show more relevant results for 'dependencies' dropdown (#8003)
- Fix issue dropdown breaking when currently selected issue is included in results.
- Add `sort` parameter to `/issues/search` API.
- Sort dropdown by relevance.
- Make priority_repo_id work again.
- Added E2E test.

Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/8003
Reviewed-by: Shiny Nematoda <snematoda@noreply.codeberg.org>
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
Co-authored-by: Danko Aleksejevs <danko@very.lv>
Co-committed-by: Danko Aleksejevs <danko@very.lv>
2025-06-26 20:06:21 +02:00
Danko Aleksejevs
905a5748a8 Add issue number to the search index, rank number and title matches higher (#7956) (#7968)
An attempt at solving #7956. This (and rebuilding the index) seems enough to ensure the issue *appears* among the results.

However, I couldn't figure out from [bleve docs](https://github.com/blevesearch/bleve/blob/master/docs/scoring.md) how to affect the scoring based on specific fields, or whether that is possible at all.

Disclaimer: I've never written Go before, sorry 😅 take it as a quick PoC more than anything.

### Tests

- I added test coverage for Go changes...
  - [x] in their respective `*_test.go` for unit tests.
  - [ ] in the `tests/integration` directory if it involves interactions with a live Forgejo server.
- I added test coverage for JavaScript changes...
  - [ ] in `web_src/js/*.test.js` if it can be unit tested.
  - [ ] in `tests/e2e/*.test.e2e.js` if it requires interactions with a live Forgejo server (see also the [developer guide for JavaScript testing](https://codeberg.org/forgejo/forgejo/src/branch/forgejo/tests/e2e/README.md#end-to-end-tests)).

### Documentation

- [ ] I created a pull request [to the documentation](https://codeberg.org/forgejo/docs) to explain to Forgejo users how to use this change.
- [x] I did not document these changes and I do not expect someone else to do it.

### Release notes

- [ ] I do not want this change to show in the release notes.
- [x] I want the title to show in the release notes with a link to this pull request.
- [ ] I want the content of the `release-notes/<pull request number>.md` to be be used for the release notes instead of the title.

<!--start release-notes-assistant-->

## Release notes
<!--URL:https://codeberg.org/forgejo/forgejo-->
- Features
  - [PR](https://codeberg.org/forgejo/forgejo/pulls/7968): <!--number 7968 --><!--line 0 --><!--description QWRkIGlzc3VlIG51bWJlciB0byB0aGUgc2VhcmNoIGluZGV4LCByYW5rIG51bWJlciBhbmQgdGl0bGUgbWF0Y2hlcyBoaWdoZXIgKCM3OTU2KQ==-->Add issue number to the search index, rank number and title matches higher (#7956)<!--description-->
<!--end release-notes-assistant-->

Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/7968
Reviewed-by: Shiny Nematoda <snematoda@noreply.codeberg.org>
Co-authored-by: Danko Aleksejevs <danko@very.lv>
Co-committed-by: Danko Aleksejevs <danko@very.lv>
2025-06-04 07:42:29 +02:00
Renovate Bot
b04bb28ed1 Update module github.com/blevesearch/bleve/v2 to v2.5.0 (forgejo) (#7468)
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [github.com/blevesearch/bleve/v2](https://github.com/blevesearch/bleve) | require | minor | `v2.4.4` -> `v2.5.0` |

---

### Release Notes

<details>
<summary>blevesearch/bleve (github.com/blevesearch/bleve/v2)</summary>

### [`v2.5.0`](https://github.com/blevesearch/bleve/releases/tag/v2.5.0)

[Compare Source](https://github.com/blevesearch/bleve/compare/v2.4.4...v2.5.0)

##### Bug Fixes

-   Exact hits to score higher than fuzzy hits, with https://github.com/blevesearch/bleve/pull/2056
-   Fix boosting during hybrid search that involves text + nearest neighbor, with https://github.com/blevesearch/bleve/pull/2127
-   Addressed bug in IP field handling while highlighting, with https://github.com/blevesearch/bleve/pull/2142
-   Graceful error handling within registry, with https://github.com/blevesearch/bleve/pull/2151
-   `http/` package (meant for demo purposes) removed from repository to remove vulnerability - [CVE-2022-31022](https://github.com/blevesearch/bleve/security/advisories/GHSA-9w9f-6mg8-jp7w), relocated to within https://github.com/blevesearch/bleve-explorer
-   Geo radius queries will now advertise distances (within sort values) in readable format, with https://github.com/blevesearch/bleve/pull/2137

##### Improvements

-   Vector search requires `faiss` dynamic library to be built from [blevesearch/faiss@352484e](352484e0fc) which is a modified version of [v1.10.0](https://github.com/facebookresearch/faiss/releases/tag/v1.10.0)
-   Support for **BM25 scoring**, see: [scoring.md](https://github.com/blevesearch/bleve/blob/v2.5.0/docs/scoring.md#bm25)
-   Support for **synonyms' search**, see: [synonyms.md](https://github.com/blevesearch/bleve/blob/v2.5.0/docs/synonyms.md)
-   **Significant performance improvements in pre-filtered vector search**, with https://github.com/blevesearch/bleve/pull/2169 + dependent changes
-   `auto` fuzziness detection with https://github.com/blevesearch/bleve/pull/2060
-   Ability to affect ingestion/drain rate by tuning persister workers with https://github.com/blevesearch/bleve/pull/2100
-   Additional config in merge policy for improved merger behavior, with https://github.com/blevesearch/bleve/pull/2134
-   Geo improvements: footprint reduction for polygons, better validation and graceful error handling, with https://github.com/blevesearch/bleve/pull/2162 + https://github.com/blevesearch/bleve/pull/2158 + https://github.com/blevesearch/bleve/pull/2165
-   Upgrade to RoaringBitmap/roaring@v2.4.5, etcd.io/bbolt@v1.4.0
-   More metrics

##### Milestone

-   [v2.5.0](https://github.com/blevesearch/bleve/milestone/24)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "* 0-3 * * *" (UTC), Automerge - "* 0-3 * * *" (UTC).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yMjIuMSIsInVwZGF0ZWRJblZlciI6IjM5LjIyMi4xIiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=-->

Co-authored-by: Gusted <postmaster@gusted.xyz>
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/7468
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
Reviewed-by: Shiny Nematoda <snematoda@noreply.codeberg.org>
Co-authored-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
Co-committed-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
2025-04-06 08:41:38 +00:00
Gusted
2457f5ff22 chore: branding import path (#7337)
- Massive replacement of changing `code.gitea.io/gitea` to `forgejo.org`.
- Resolves forgejo/discussions#258

Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/7337
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Reviewed-by: Michael Kriese <michael.kriese@gmx.de>
Reviewed-by: Beowulf <beowulf@beocode.eu>
Reviewed-by: Panagiotis "Ivory" Vasilopoulos <git@n0toose.net>
Co-authored-by: Gusted <postmaster@gusted.xyz>
Co-committed-by: Gusted <postmaster@gusted.xyz>
2025-03-27 19:40:14 +00:00
Shiny Nematoda
cddf608cb9 feat(issue search): query string for boolean operators and phrase search (#6952)
closes #6909

related to forgejo/design#14

# Description

Adds the following boolean operators for issues when using an indexer (with minor caveats)

- `+term`: `term` MUST be present for any result
- `-term`: negation; exclude results that contain `term`
- `"this is a term"`: matches the exact phrase `this is a term`

In all cases the special characters may be escaped by the prefix `\`

Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/6952
Reviewed-by: 0ko <0ko@noreply.codeberg.org>
Reviewed-by: Otto <otto@codeberg.org>
Co-authored-by: Shiny Nematoda <snematoda.751k2@aleeas.com>
Co-committed-by: Shiny Nematoda <snematoda.751k2@aleeas.com>
2025-02-23 08:35:35 +00:00
Shiny Nematoda
a265574821 enh(search): improve issue search
- new sort by relevency option for issue search
- rework bleve fuzzy search to consider each term rather than matching the entire phrase
2024-11-10 07:17:27 +00:00
Lunny Xiao
a7591f9738
Rename project board -> column to make the UI less confusing (#30170)
This PR split the `Board` into two parts. One is the struct has been
renamed to `Column` and the second we have a `Template Type`.

But to make it easier to review, this PR will not change the database
schemas, they are just renames. The database schema changes could be in
future PRs.

---------

Co-authored-by: silverwind <me@silverwind.io>
Co-authored-by: yp05327 <576951401@qq.com>
(cherry picked from commit 98751108b11dc748cc99230ca0fc1acfdf2c8929)

Conflicts:
	docs/content/administration/config-cheat-sheet.en-us.md
	docs/content/index.en-us.md
	docs/content/installation/comparison.en-us.md
	docs/content/usage/permissions.en-us.md
	non existent files

	options/locale/locale_en-US.ini
	routers/web/web.go
	templates/repo/header.tmpl
	templates/repo/settings/options.tmpl
	trivial context conflicts
2024-06-02 09:42:39 +02:00
Shiny Nematoda
a641ebf221 [FIX] Set max fuzziness to 2 for bleve (#3444)
closes #3443

regression from ab5f0b7558

Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/3444
Reviewed-by: Otto <otto@codeberg.org>
Co-authored-by: Shiny Nematoda <snematoda.751k2@aleeas.com>
Co-committed-by: Shiny Nematoda <snematoda.751k2@aleeas.com>
2024-04-26 08:08:47 +00:00
6543
ab5f0b7558
Determine fuzziness of bleve indexer by keyword length (#29706)
also bleve did match on fuzzy search and the other way around. this also fix that bug.

(cherry picked from commit b9c57fb78e8e0d80d786d8e1da433b6c7ebf2f1c)

Conflicts:
	tests/integration/repo_search_test.go
	simple conflict resolution in the tests
2024-03-26 19:04:27 +01:00
6543
d9103449b3
Refactor to use optional.Option for issue index search option (#29739)
Signed-off-by: 6543 <6543@obermui.de>
(cherry picked from commit 7fd0a5b276aadcf88dcc012fcd364fe160a58810)
2024-03-20 08:46:28 +01:00
6543
38c3cc4eb7
Patch in exact search for meilisearch (#29671)
meilisearch does not have an search option to contorl fuzzynes per query
right now:
 - https://github.com/meilisearch/meilisearch/issues/1192
 - https://github.com/orgs/meilisearch/discussions/377
 - https://github.com/meilisearch/meilisearch/discussions/1096

so we have to create a workaround by post-filter the search result in
gitea until this is addressed.

For future works I added an option in backend only atm, to enable
fuzzynes for issue indexer too.
And also refactored the code so the fuzzy option is equal in logic to
code indexer

---
*Sponsored by Kithara Software GmbH*
Conflicts:
	routers/web/repo/search.go
	trivial context confict s/isMatch/isFuzzy/
2024-03-11 23:37:00 +07:00
6543
e2371743d5
remove util.OptionalBool and related functions (#29513)
and migrate affected code

_last refactoring bits to replace **util.OptionalBool** with
**optional.Option[bool]**_

(cherry picked from commit a3f05d0d98408bb47333b19f505b21afcefa9e7c)

Conflicts:
	services/repository/branch.go
	trivial context conflict
2024-03-06 12:10:46 +08:00
Jason Song
1e76a824bc
Refactor and enhance issue indexer to support both searching, filtering and paging (#26012)
Fix #24662.

Replace #24822 and #25708 (although it has been merged)


## Background

In the past, Gitea supported issue searching with a keyword and
conditions in a less efficient way. It worked by searching for issues
with the keyword and obtaining limited IDs (as it is heavy to get all)
on the indexer (bleve/elasticsearch/meilisearch), and then querying with
conditions on the database to find a subset of the found IDs. This is
why the results could be incomplete.

To solve this issue, we need to store all fields that could be used as
conditions in the indexer and support both keyword and additional
conditions when searching with the indexer.

## Major changes

- Redefine `IndexerData` to include all fields that could be used as
filter conditions.
- Refactor `Search(ctx context.Context, kw string, repoIDs []int64,
limit, start int, state string)` to `Search(ctx context.Context, options
*SearchOptions)`, so it supports more conditions now.
- Change the data type stored in `issueIndexerQueue`. Use
`IndexerMetadata` instead of `IndexerData` in case the data has been
updated while it is in the queue. This also reduces the storage size of
the queue.
- Enhance searching with Bleve/Elasticsearch/Meilisearch, make them
fully support `SearchOptions`. Also, update the data versions.
- Keep most logic of database indexer, but remove
`issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is
the entry point to search issues.
- Start a Meilisearch instance to test it in unit tests.
- Add unit tests with almost full coverage to test
Bleve/Elasticsearch/Meilisearch indexer.

---------

Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-07-31 06:28:53 +00:00
techknowlogick
cb01b8691d
Add open/closed field support for issue index (#25708)
A couple of notes:
* Future changes should refactor arguments into a struct
* This filtering only is supported by meilisearch right now
* Issue index number is bumped which will cause a re-index
2023-07-07 17:10:13 +00:00
silverwind
88f835192d
Replace interface{} with any (#25686)
Result of running `perl -p -i -e 's#interface\{\}#any#g' **/*` and `make fmt`.

Basically the same [as golang did](2580d0e08d).
2023-07-04 18:36:08 +00:00
Jason Song
9958642502
Fix issues indexer document mapping (#25619)
Fix regression of #5363 (so long ago).

The old code definded a document mapping for `issueIndexerDocType`, and
assigned it to `BleveIndexerData` as its type. (`BleveIndexerData` has
been renamed to `IndexerData` in #25174, but nothing more.) But the old
code never used `BleveIndexerData`, it wrote the index with an anonymous
struct type. Nonetheless, bleve would use the default auto-mapping for
struct it didn't know, so the indexer still worked. This means the
custom document mapping was always dead code.

The custom document mapping is not useless, it can reduce index storage,
this PR brings it back and disable default mapping to prevent it from
happening again. Since `IndexerData`(`BleveIndexerData`) has JSON tags,
and bleve uses them first, so we should use `repo_id` as the field name
instead of `RepoID`.

I did a test to compare the storage size before and after this, with
about 3k real comments that were migrated from some public repos.

Before:

```text
[ 160]  .
├── [  42]  index_meta.json
├── [  13]  rupture_meta.json
└── [ 128]  store
    ├── [6.9M]  00000000005d.zap
    └── [256K]  root.bolt
```

After:

```text
[ 160]  .
├── [  42]  index_meta.json
├── [  13]  rupture_meta.json
└── [ 128]  store
    ├── [3.5M]  000000000065.zap
    └── [256K]  root.bolt
```

It saves about half the storage space.

---------

Co-authored-by: Giteabot <teabot@gitea.io>
2023-07-04 09:05:28 +00:00
Jason Song
375fd15fbf
Refactor indexer (#25174)
Refactor `modules/indexer` to make it more maintainable. And it can be
easier to support more features. I'm trying to solve some of issue
searching, this is a precursor to making functional changes.

Current supported engines and the index versions:

| engines | issues | code |
| - | - | - |
| db | Just a wrapper for database queries, doesn't need version | - |
| bleve | The version of index is **2** | The version of index is **6**
|
| elasticsearch | The old index has no version, will be treated as
version **0** in this PR | The version of index is **1** |
| meilisearch | The old index has no version, will be treated as version
**0** in this PR | - |


## Changes

### Split

Splited it into mutiple packages

```text
indexer
├── internal
│   ├── bleve
│   ├── db
│   ├── elasticsearch
│   └── meilisearch
├── code
│   ├── bleve
│   ├── elasticsearch
│   └── internal
└── issues
    ├── bleve
    ├── db
    ├── elasticsearch
    ├── internal
    └── meilisearch
```

- `indexer/interanal`: Internal shared package for indexer.
- `indexer/interanal/[engine]`: Internal shared package for each engine
(bleve/db/elasticsearch/meilisearch).
- `indexer/code`: Implementations for code indexer.
- `indexer/code/internal`: Internal shared package for code indexer.
- `indexer/code/[engine]`: Implementation via each engine for code
indexer.
- `indexer/issues`: Implementations for issues indexer.

### Deduplication

- Combine `Init/Ping/Close` for code indexer and issues indexer.
- ~Combine `issues.indexerHolder` and `code.wrappedIndexer` to
`internal.IndexHolder`.~ Remove it, use dummy indexer instead when the
indexer is not ready.
- Duplicate two copies of creating ES clients.
- Duplicate two copies of `indexerID()`.


### Enhancement

- [x] Support index version for elasticsearch issues indexer, the old
index without version will be treated as version 0.
- [x] Fix spell of `elastic_search/ElasticSearch`, it should be
`Elasticsearch`.
- [x] Improve versioning of ES index. We don't need `Aliases`:
- Gitea does't need aliases for "Zero Downtime" because it never delete
old indexes.
- The old code of issues indexer uses the orignal name to create issue
index, so it's tricky to convert it to an alias.
- [x] Support index version for meilisearch issues indexer, the old
index without version will be treated as version 0.
- [x] Do "ping" only when `Ping` has been called, don't ping
periodically and cache the status.
- [x] Support the context parameter whenever possible.
- [x] Fix outdated example config.
- [x] Give up the requeue logic of issues indexer: When indexing fails,
call Ping to check if it was caused by the engine being unavailable, and
only requeue the task if the engine is unavailable.
- It is fragile and tricky, could cause data losing (It did happen when
I was doing some tests for this PR). And it works for ES only.
- Just always requeue the failed task, if it caused by bad data, it's a
bug of Gitea which should be fixed.

---------

Co-authored-by: Giteabot <teabot@gitea.io>
2023-06-23 12:37:56 +00:00